This is an automated email from the ASF dual-hosted git repository. liaoxin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 2a1b51d18a4 [opt](load) add load faq (#2333) 2a1b51d18a4 is described below commit 2a1b51d18a48d580b1d3d6fdb40c7b8f6cee846c Author: Xin Liao <liao...@selectdb.com> AuthorDate: Tue Apr 29 09:51:59 2025 +0800 [opt](load) add load faq (#2333) --- docs/faq/load-faq.md | 140 +++++++++++++++++++++ docs/faq/routineload-faq.md | 55 -------- .../faq/{routineload-faq.md => load-faq.md} | 97 +++++++++++++- .../faq/{routineload-faq.md => load-faq.md} | 97 +++++++++++++- .../faq/{routineload-faq.md => load-faq.md} | 97 +++++++++++++- sidebars.json | 2 +- versioned_docs/version-2.1/faq/load-faq.md | 140 +++++++++++++++++++++ versioned_docs/version-2.1/faq/routineload-faq.md | 55 -------- versioned_docs/version-3.0/faq/load-faq.md | 140 +++++++++++++++++++++ versioned_docs/version-3.0/faq/routineload-faq.md | 55 -------- versioned_sidebars/version-2.1-sidebars.json | 2 +- versioned_sidebars/version-3.0-sidebars.json | 2 +- 12 files changed, 696 insertions(+), 186 deletions(-) diff --git a/docs/faq/load-faq.md b/docs/faq/load-faq.md new file mode 100644 index 00000000000..c9fe8f19ef3 --- /dev/null +++ b/docs/faq/load-faq.md @@ -0,0 +1,140 @@ +--- +{ + "title": "Load FAQ", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +## General Load FAQ + +### Error "[DATA_QUALITY_ERROR] Encountered unqualified data" +**Problem Description**: Data quality error during loading. + +**Solution**: +- Stream Load and Insert Into operations will return an error URL, while for Broker Load you can check the error URL through the `Show Load` command. +- Use a browser or curl command to access the error URL to view the specific data quality error reasons. +- Use the strict_mode and max_filter_ratio parameters to control the acceptable error rate. + +### Error "[E-235] Failed to init rowset builder" +**Problem Description**: Error -235 occurs when the load frequency is too high and data hasn't been compacted in time, exceeding version limits. + +**Solution**: +- Increase the batch size of data loading and reduce loading frequency. +- Increase the `max_tablet_version_num` parameter in `be.conf`, it is recommended not to exceed 5000. + +### Error "[E-238] Too many segments in rowset" +**Problem Description**: Error -238 occurs when the number of segments under a single rowset exceeds the limit. + +**Common Causes**: +- The bucket number configured during table creation is too small. +- Data skew occurs; consider using more balanced bucket keys. + +### Error "Transaction commit successfully, BUT data will be visible later" +**Problem Description**: Data load is successful but temporarily not visible. + +**Cause**: Usually due to transaction publish delay caused by system resource pressure. + +### Error "Failed to commit kv txn [...] Transaction exceeds byte limit" +**Problem Description**: In shared-nothing mode, too many partitions and tablets are involved in a single load, exceeding the transaction size limit. + +**Solution**: +- Load data by partition in batches to reduce the number of partitions involved in a single load. +- Optimize table structure to reduce the number of partitions and tablets. + +### Extra "\r" in the last column of CSV file +**Problem Description**: Usually caused by Windows line endings. + +**Solution**: +Specify the correct line delimiter: `-H "line_delimiter:\r\n"` + +### CSV data with quotes imported as null +**Problem Description**: CSV data with quotes becomes null after import. + +**Solution**: +Use the `trim_double_quotes` parameter to remove double quotes around fields. + +## Stream Load + +### Reasons for Slow Loading +- Bottlenecks in CPU, IO, memory, or network card resources. +- Slow network between client machine and BE machines, can be initially diagnosed through ping latency from client to BE machines. +- Webserver thread count bottleneck, too many concurrent Stream Loads on a single BE (exceeding be.conf webserver_num_workers configuration) may cause thread count bottleneck. +- Memtable Flush thread count bottleneck, check BE metrics doris_be_flush_thread_pool_queue_size to see if queuing is severe. Can be resolved by increasing the be.conf flush_thread_num_per_store parameter. + +### Handling Special Characters in Column Names +When column names contain special characters, use single quotes with backticks to specify the columns parameter: +```shell +curl --location-trusted -u root:"" \ + -H 'columns:`@coltime`,colint,colvar' \ + -T a.csv \ + -H "column_separator:," \ + http://127.0.0.1:8030/api/db/loadtest/_stream_load +``` + +## Routine Load + +### Major Bug Fixes + +| Issue Description | Trigger Conditions | Impact Scope | Temporary Solution | Affected Versions | Fixed Versions | Fix PR | +|------------------|-------------------|--------------|-------------------|------------------|----------------|---------| +| When at least one job times out while connecting to Kafka, it affects the import of other jobs, slowing down global Routine Load imports. | At least one job times out while connecting to Kafka. | Shared-nothing and shared-storage | Stop or manually pause the job to resolve the issue. | <2.1.9 <3.0.5 | 2.1.9 3.0.5 | [#47530](https://github.com/apache/doris/pull/47530) | +| User data may be lost after restarting the FE Master. | The job's offset is set to OFFSET_END, and the FE is restarted. | Shared-storage | Change the consumption mode to OFFSET_BEGINNING. | 3.0.2-3.0.4 | 3.0.5 | [#46149](https://github.com/apache/doris/pull/46149) | +| A large number of small transactions are generated during import, causing compaction to fail and resulting in continuous -235 errors. | Doris consumes data too quickly, or Kafka data flow is in small batches. | Shared-nothing and shared-storage | Pause the Routine Load job and execute the following command: `ALTER ROUTINE LOAD FOR jobname FROM kafka ("property.enable.partition.eof" = "false");` | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#45528](https://github.com/apache/doris/pull/45528), [#4494 [...] +| Kafka third-party library destructor hangs, causing data consumption to fail. | Kafka topic deletion (possibly other conditions). | Shared-nothing and shared-storage | Restart all BE nodes. | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#44913](https://github.com/apache/doris/pull/44913) | +| Routine Load scheduling hangs. | Timeout occurs when FE aborts a transaction in Meta Service. | Shared-storage | Restart the FE node. | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | +| Routine Load restart issue. | Restarting BE nodes. | Shared-nothing and shared-storage | Manually resume the job. | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | + +### Default Configuration Optimizations + +| Optimization Content | Applied Versions | Corresponding PR | +|---------------------|------------------|------------------| +| Increased the timeout duration for Routine Load. | 2.1.7 3.0.3 | [#42042](https://github.com/apache/doris/pull/42042), [#40818](https://github.com/apache/doris/pull/40818) | +| Adjusted the default value of `max_batch_interval`. | 2.1.8 3.0.3 | [#42491](https://github.com/apache/doris/pull/42491) | +| Removed the restriction on `max_batch_interval`. | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | +| Adjusted the default values of `max_batch_rows` and `max_batch_size`. | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | + +### Observability Optimizations + +| Optimization Content | Applied Versions | Corresponding PR | +|---------------------|------------------|------------------| +| Added observability-related metrics. | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | + +### Error "failed to get latest offset" +**Problem Description**: Routine Load cannot get the latest Kafka offset. + +**Common Causes**: +- Usually due to network connectivity issues with Kafka. Verify by pinging or using telnet to test the Kafka domain name. +- Timeout caused by third-party library bug, error: java.util.concurrent.TimeoutException: Waited X seconds + +### Error "failed to get partition meta: Local:'Broker transport failure" +**Problem Description**: Routine Load cannot get Kafka Topic Partition Meta. + +**Common Causes**: +- Usually due to network connectivity issues with Kafka. Verify by pinging or using telnet to test the Kafka domain name. +- If using domain names, try configuring domain name mapping in /etc/hosts + +### Error "Broker: Offset out of range" +**Problem Description**: The consumed offset doesn't exist in Kafka, possibly because it has been cleaned up by Kafka. + +**Solution**: +- Need to specify a new offset for consumption, for example, set offset to OFFSET_BEGINNING. +- Need to set appropriate Kafka log cleanup parameters based on import speed: log.retention.hours, log.retention.bytes, etc. \ No newline at end of file diff --git a/docs/faq/routineload-faq.md b/docs/faq/routineload-faq.md deleted file mode 100644 index 3960f67bfe8..00000000000 --- a/docs/faq/routineload-faq.md +++ /dev/null @@ -1,55 +0,0 @@ ---- -{ - "title": "Routine Load FAQ", - "language": "en" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -# Routine Load FAQ - -This document records common issues, bug fixes, and optimization improvements related to Routine Load in Doris. It will be updated periodically. - -## Major Bug Fixes - -| Issue Description | Trigger Conditions | Impact Scope | Temporary Solution | Affected Versions | Fixed Versions | Fix PR | -| ----------------------------------------------------------- | ------------------------------------------- | ----------------- | ---------------------------------------------------------- | ----------------- | -------------- | ---------------------------------------------------------- | -| When at least one job times out while connecting to Kafka, it affects the import of other jobs, slowing down global Routine Load imports. | At least one job times out while connecting to Kafka. | Shared-nothing and shared-storage | Stop or manually pause the job to resolve the issue. | <2.1.9 <3.0.5 | 2.1.9 3.0.5 | [#47530](https://github.com/apache/doris/pull/47530) | -| User data may be lost after restarting the FE Master. | The job's offset is set to OFFSET_END, and the FE is restarted. | Shared-storage | Change the consumption mode to OFFSET_BEGINNING. | 3.0.2-3.0.4 | 3.0.5 | [#46149](https://github.com/apache/doris/pull/46149) | -| A large number of small transactions are generated during import, causing compaction to fail and resulting in continuous -235 errors. | Doris consumes data too quickly, or Kafka data flow is in small batches. | Shared-nothing and shared-storage | Pause the Routine Load job and execute the following command: `ALTER ROUTINE LOAD FOR jobname FROM kafka ("property.enable.partition.eof" = "false");` | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#45528](https://github.com/apache/doris/pull/45528), [ [...] -| Kafka third-party library destructor hangs, causing data consumption to fail. | Kafka topic deletion (possibly other conditions). | Shared-nothing and shared-storage | Restart all BE nodes. | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#44913](https://github.com/apache/doris/pull/44913) | -| Routine Load scheduling hangs. | Timeout occurs when FE aborts a transaction in Meta Service. | Shared-storage | Restart the FE node. | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | -| Routine Load restart issue. | Restarting BE nodes. | Shared-nothing and shared-storage | Manually resume the job. | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | - -## Default Configuration Optimizations - -| Optimization Content | Applied Versions | Corresponding PR | -| ------------------------------------------- | ---------------- | ---------------------------------------------------------- | -| Increased the timeout duration for Routine Load. | 2.1.7 3.0.3 | [#42042](https://github.com/apache/doris/pull/42042), [#40818](https://github.com/apache/doris/pull/40818) | -| Adjusted the default value of `max_batch_interval`. | 2.1.8 3.0.3 | [#42491](https://github.com/apache/doris/pull/42491) | -| Removed the restriction on `max_batch_interval`. | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | -| Adjusted the default values of `max_batch_rows` and `max_batch_size`. | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | - -## Observability Optimizations - -| Optimization Content | Applied Versions | Corresponding PR | -| ---------------------------- | ---------------- | ---------------------------------------------------------- | -| Added observability-related metrics. | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/routineload-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/load-faq.md similarity index 54% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/routineload-faq.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/load-faq.md index 84891e72d74..35d5ffbc946 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/routineload-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/load-faq.md @@ -1,6 +1,6 @@ --- { - "title": "Routine Load 常见问题", + "title": "常见导入问题", "language": "zh-CN" } --- @@ -24,11 +24,75 @@ specific language governing permissions and limitations under the License. --> -# Routine Load 常见问题 +## 导入通用问题 -本文档记录了 Doris 在使用过程中与 Routine Load 相关的常见问题、Bug 修复及优化改进,并将不定期更新。 +### 报错”[DATA_QUALITY_ERROR] Encountered unqualified data“ +**问题描述**:导入报数据质量错误。 -## 较严重的 Bug 修复 +**解决方案**: +- Stream Load 和 Insert Into 结果中会返回错误 URL,Broker Load 可通过 `Show Load` 命令查看对应错误 URL。 +- 通过浏览器或 curl 命令访问错误 URL 查看具体的数量质量错误原因。 +- 通过 strict_mode 和 max_filter_ratio 参数项来控制能容忍的错误率。 + +### 报错“[E-235] Failed to init rowset builder” +**问题描述**:-235 错误是因为导入频率过高,数据未能及时 compaction,超过版本限制。 + +**解决方案**: +- 增加每批次导入数据量,降低导入频率。 +- 在 `be.conf` 中调大 `max_tablet_version_num` 参数, 建议不超过5000。 + +### 报错“[E-238] Too many segments in rowset” +**问题描述**:-238 错误是因为单个 rowset 下的 segment 数量超限。 + +**常见原因**: +- 建表时 bucket 数配置过小。 +- 数据出现倾斜,建议使用更均衡的分桶键。 + +### 报错”Transaction commit successfully, BUT data will be visible later“ +**问题描述**:数据导入成功但暂时不可见。 + +**原因**:通常是由于系统资源压力导致事务 publish 延迟。 + +### 报错”Failed to commit kv txn [...] Transaction exceeds byte limit“ +**问题描述**:存算分离模式下,单次导入涉及的 partition 和 tablet 过多, 超过事务大小的限制。 + +**解决方案**: +- 分批按 partition 导入数据, 减小单次导入涉及到的 partition 数量。 +- 优化表结构减少 partition 和 tablet 数量。 + +### CSV 文件最后一列出现额外的 "\r" +**问题描述**:通常是 windows 换行符导致。 + +**解决方案**: +指定正确的换行符:`-H "line_delimiter:\r\n"` + +### CSV 带引号数据导入为 null +**问题描述**:带引号的 CSV 数据导入后值变为 null。 + +**解决方案**: +使用 `trim_double_quotes` 参数去除字段外层双引号。 + +## Stream Load + +### 导入慢的原因 +- CPU、IO、内存、网卡资源有瓶颈。 +- 客户端机器到 BE 机器网络慢, 通过客户端机器到 BE 机器的 Ping 时延可以做初步的判断。 +- Webserver 线程数瓶颈,单 BE 上 Stream Load 并发数太高(超过be.conf webserver_num_workers 配置)可能导致线程数据瓶颈。 +- Memtable Flush 线程数瓶颈,通过 BE metrics 查看 doris_be_flush_thread_pool_queue_size 看排队是否比较严重。可以适当调大 be.conf flush_thread_num_per_store 参数来解决。 + +### 特殊字符列名处理 +列名中含有特殊字符时需要使用单引号配合反引号方式指定 columns 参数: +```shell +curl --location-trusted -u root:"" \ + -H 'columns:`@coltime`,colint,colvar' \ + -T a.csv \ + -H "column_separator:," \ + http://127.0.0.1:8030/api/db/loadtest/_stream_load +``` + +## Routine Load + +### 较严重的 Bug 修复 | 问题描述 | 发生条件 | 影响范围 | 临时解决方案 | 受影响版本 | 修复版本 | 修复 PR | | ---------------------------------------------------------- | ------------------------------------------ | ---------------- | ---------------------------------------------------------- | ------------- | ----------- | ---------------------------------------------------------- | @@ -39,7 +103,7 @@ under the License. | Routine Load 调度卡住 | 当 FE 向 Meta Service 中止事务时发生超时 | 存算分离 | 重启 FE 节点。 | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | | Routine Load 重启问题 | 重启 BE 节点 | 存算分离存算一体 | 手动恢复 Job。 | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | -## 默认配置优化 +### 默认配置优化 | 优化内容 | 合入版本 | 对应 PR | | ---------------------------------------- | ---------- | ---------------------------------------------------------- | @@ -48,8 +112,29 @@ under the License. | 移除了 max_batch_interval 的限制 | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | | 调整了 max_batch_rows 和 max_batch_size 的默认值 | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | -## 可观测优化 +### 可观测优化 | 优化内容 | 合入版本 | 对应 PR | | ----------------------- | -------- | ---------------------------------------------------------- | | 增加了可观测性相关的 Metrics 指标 | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | + +### 报错”failed to get latest offset“ +**问题描述**:Routine Load 无法获取 Kafka 最新的 Offset。 + +**常见原因**: +- 一般都是到kafka的网络不通, ping或者telnet kafka的域名确认下 +- 三方库的bug导致的获取超时,错误为:java.util.concurrent.TimeoutException: Waited X seconds + +### 报错”failed to get partition meta: Local:'Broker transport failure“ +**问题描述**:Routine Load 无法获取 Kafka Topic 的 Partition Meta。 + +**常见原因**: +- 一般都是到kafka的网络不通, ping或者telnet kafka的域名确认下 +- 如果使用的是域名的方式,可以在/etc/hosts 配置域名映射 + +### 报错“Broker: Offset out of range” +**问题描述**:消费的 offset 在 kafka 中不存在,可能是因为该 offset 已经被 kafka 清理掉了。 + +**解决方案**: +- 需要重新指定 offset 进行消费,例如可以指定 offset 为 OFFSET_BEGINNING。 +- 需要根据导入速度设置合理的 kafka log清理参数:log.retention.hours、log.retention.bytes等。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/routineload-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/load-faq.md similarity index 54% rename from i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/routineload-faq.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/load-faq.md index 84891e72d74..35d5ffbc946 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/routineload-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/faq/load-faq.md @@ -1,6 +1,6 @@ --- { - "title": "Routine Load 常见问题", + "title": "常见导入问题", "language": "zh-CN" } --- @@ -24,11 +24,75 @@ specific language governing permissions and limitations under the License. --> -# Routine Load 常见问题 +## 导入通用问题 -本文档记录了 Doris 在使用过程中与 Routine Load 相关的常见问题、Bug 修复及优化改进,并将不定期更新。 +### 报错”[DATA_QUALITY_ERROR] Encountered unqualified data“ +**问题描述**:导入报数据质量错误。 -## 较严重的 Bug 修复 +**解决方案**: +- Stream Load 和 Insert Into 结果中会返回错误 URL,Broker Load 可通过 `Show Load` 命令查看对应错误 URL。 +- 通过浏览器或 curl 命令访问错误 URL 查看具体的数量质量错误原因。 +- 通过 strict_mode 和 max_filter_ratio 参数项来控制能容忍的错误率。 + +### 报错“[E-235] Failed to init rowset builder” +**问题描述**:-235 错误是因为导入频率过高,数据未能及时 compaction,超过版本限制。 + +**解决方案**: +- 增加每批次导入数据量,降低导入频率。 +- 在 `be.conf` 中调大 `max_tablet_version_num` 参数, 建议不超过5000。 + +### 报错“[E-238] Too many segments in rowset” +**问题描述**:-238 错误是因为单个 rowset 下的 segment 数量超限。 + +**常见原因**: +- 建表时 bucket 数配置过小。 +- 数据出现倾斜,建议使用更均衡的分桶键。 + +### 报错”Transaction commit successfully, BUT data will be visible later“ +**问题描述**:数据导入成功但暂时不可见。 + +**原因**:通常是由于系统资源压力导致事务 publish 延迟。 + +### 报错”Failed to commit kv txn [...] Transaction exceeds byte limit“ +**问题描述**:存算分离模式下,单次导入涉及的 partition 和 tablet 过多, 超过事务大小的限制。 + +**解决方案**: +- 分批按 partition 导入数据, 减小单次导入涉及到的 partition 数量。 +- 优化表结构减少 partition 和 tablet 数量。 + +### CSV 文件最后一列出现额外的 "\r" +**问题描述**:通常是 windows 换行符导致。 + +**解决方案**: +指定正确的换行符:`-H "line_delimiter:\r\n"` + +### CSV 带引号数据导入为 null +**问题描述**:带引号的 CSV 数据导入后值变为 null。 + +**解决方案**: +使用 `trim_double_quotes` 参数去除字段外层双引号。 + +## Stream Load + +### 导入慢的原因 +- CPU、IO、内存、网卡资源有瓶颈。 +- 客户端机器到 BE 机器网络慢, 通过客户端机器到 BE 机器的 Ping 时延可以做初步的判断。 +- Webserver 线程数瓶颈,单 BE 上 Stream Load 并发数太高(超过be.conf webserver_num_workers 配置)可能导致线程数据瓶颈。 +- Memtable Flush 线程数瓶颈,通过 BE metrics 查看 doris_be_flush_thread_pool_queue_size 看排队是否比较严重。可以适当调大 be.conf flush_thread_num_per_store 参数来解决。 + +### 特殊字符列名处理 +列名中含有特殊字符时需要使用单引号配合反引号方式指定 columns 参数: +```shell +curl --location-trusted -u root:"" \ + -H 'columns:`@coltime`,colint,colvar' \ + -T a.csv \ + -H "column_separator:," \ + http://127.0.0.1:8030/api/db/loadtest/_stream_load +``` + +## Routine Load + +### 较严重的 Bug 修复 | 问题描述 | 发生条件 | 影响范围 | 临时解决方案 | 受影响版本 | 修复版本 | 修复 PR | | ---------------------------------------------------------- | ------------------------------------------ | ---------------- | ---------------------------------------------------------- | ------------- | ----------- | ---------------------------------------------------------- | @@ -39,7 +103,7 @@ under the License. | Routine Load 调度卡住 | 当 FE 向 Meta Service 中止事务时发生超时 | 存算分离 | 重启 FE 节点。 | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | | Routine Load 重启问题 | 重启 BE 节点 | 存算分离存算一体 | 手动恢复 Job。 | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | -## 默认配置优化 +### 默认配置优化 | 优化内容 | 合入版本 | 对应 PR | | ---------------------------------------- | ---------- | ---------------------------------------------------------- | @@ -48,8 +112,29 @@ under the License. | 移除了 max_batch_interval 的限制 | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | | 调整了 max_batch_rows 和 max_batch_size 的默认值 | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | -## 可观测优化 +### 可观测优化 | 优化内容 | 合入版本 | 对应 PR | | ----------------------- | -------- | ---------------------------------------------------------- | | 增加了可观测性相关的 Metrics 指标 | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | + +### 报错”failed to get latest offset“ +**问题描述**:Routine Load 无法获取 Kafka 最新的 Offset。 + +**常见原因**: +- 一般都是到kafka的网络不通, ping或者telnet kafka的域名确认下 +- 三方库的bug导致的获取超时,错误为:java.util.concurrent.TimeoutException: Waited X seconds + +### 报错”failed to get partition meta: Local:'Broker transport failure“ +**问题描述**:Routine Load 无法获取 Kafka Topic 的 Partition Meta。 + +**常见原因**: +- 一般都是到kafka的网络不通, ping或者telnet kafka的域名确认下 +- 如果使用的是域名的方式,可以在/etc/hosts 配置域名映射 + +### 报错“Broker: Offset out of range” +**问题描述**:消费的 offset 在 kafka 中不存在,可能是因为该 offset 已经被 kafka 清理掉了。 + +**解决方案**: +- 需要重新指定 offset 进行消费,例如可以指定 offset 为 OFFSET_BEGINNING。 +- 需要根据导入速度设置合理的 kafka log清理参数:log.retention.hours、log.retention.bytes等。 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/routineload-faq.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/load-faq.md similarity index 54% rename from i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/routineload-faq.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/load-faq.md index 84891e72d74..35d5ffbc946 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/routineload-faq.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/faq/load-faq.md @@ -1,6 +1,6 @@ --- { - "title": "Routine Load 常见问题", + "title": "常见导入问题", "language": "zh-CN" } --- @@ -24,11 +24,75 @@ specific language governing permissions and limitations under the License. --> -# Routine Load 常见问题 +## 导入通用问题 -本文档记录了 Doris 在使用过程中与 Routine Load 相关的常见问题、Bug 修复及优化改进,并将不定期更新。 +### 报错”[DATA_QUALITY_ERROR] Encountered unqualified data“ +**问题描述**:导入报数据质量错误。 -## 较严重的 Bug 修复 +**解决方案**: +- Stream Load 和 Insert Into 结果中会返回错误 URL,Broker Load 可通过 `Show Load` 命令查看对应错误 URL。 +- 通过浏览器或 curl 命令访问错误 URL 查看具体的数量质量错误原因。 +- 通过 strict_mode 和 max_filter_ratio 参数项来控制能容忍的错误率。 + +### 报错“[E-235] Failed to init rowset builder” +**问题描述**:-235 错误是因为导入频率过高,数据未能及时 compaction,超过版本限制。 + +**解决方案**: +- 增加每批次导入数据量,降低导入频率。 +- 在 `be.conf` 中调大 `max_tablet_version_num` 参数, 建议不超过5000。 + +### 报错“[E-238] Too many segments in rowset” +**问题描述**:-238 错误是因为单个 rowset 下的 segment 数量超限。 + +**常见原因**: +- 建表时 bucket 数配置过小。 +- 数据出现倾斜,建议使用更均衡的分桶键。 + +### 报错”Transaction commit successfully, BUT data will be visible later“ +**问题描述**:数据导入成功但暂时不可见。 + +**原因**:通常是由于系统资源压力导致事务 publish 延迟。 + +### 报错”Failed to commit kv txn [...] Transaction exceeds byte limit“ +**问题描述**:存算分离模式下,单次导入涉及的 partition 和 tablet 过多, 超过事务大小的限制。 + +**解决方案**: +- 分批按 partition 导入数据, 减小单次导入涉及到的 partition 数量。 +- 优化表结构减少 partition 和 tablet 数量。 + +### CSV 文件最后一列出现额外的 "\r" +**问题描述**:通常是 windows 换行符导致。 + +**解决方案**: +指定正确的换行符:`-H "line_delimiter:\r\n"` + +### CSV 带引号数据导入为 null +**问题描述**:带引号的 CSV 数据导入后值变为 null。 + +**解决方案**: +使用 `trim_double_quotes` 参数去除字段外层双引号。 + +## Stream Load + +### 导入慢的原因 +- CPU、IO、内存、网卡资源有瓶颈。 +- 客户端机器到 BE 机器网络慢, 通过客户端机器到 BE 机器的 Ping 时延可以做初步的判断。 +- Webserver 线程数瓶颈,单 BE 上 Stream Load 并发数太高(超过be.conf webserver_num_workers 配置)可能导致线程数据瓶颈。 +- Memtable Flush 线程数瓶颈,通过 BE metrics 查看 doris_be_flush_thread_pool_queue_size 看排队是否比较严重。可以适当调大 be.conf flush_thread_num_per_store 参数来解决。 + +### 特殊字符列名处理 +列名中含有特殊字符时需要使用单引号配合反引号方式指定 columns 参数: +```shell +curl --location-trusted -u root:"" \ + -H 'columns:`@coltime`,colint,colvar' \ + -T a.csv \ + -H "column_separator:," \ + http://127.0.0.1:8030/api/db/loadtest/_stream_load +``` + +## Routine Load + +### 较严重的 Bug 修复 | 问题描述 | 发生条件 | 影响范围 | 临时解决方案 | 受影响版本 | 修复版本 | 修复 PR | | ---------------------------------------------------------- | ------------------------------------------ | ---------------- | ---------------------------------------------------------- | ------------- | ----------- | ---------------------------------------------------------- | @@ -39,7 +103,7 @@ under the License. | Routine Load 调度卡住 | 当 FE 向 Meta Service 中止事务时发生超时 | 存算分离 | 重启 FE 节点。 | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | | Routine Load 重启问题 | 重启 BE 节点 | 存算分离存算一体 | 手动恢复 Job。 | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | -## 默认配置优化 +### 默认配置优化 | 优化内容 | 合入版本 | 对应 PR | | ---------------------------------------- | ---------- | ---------------------------------------------------------- | @@ -48,8 +112,29 @@ under the License. | 移除了 max_batch_interval 的限制 | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | | 调整了 max_batch_rows 和 max_batch_size 的默认值 | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | -## 可观测优化 +### 可观测优化 | 优化内容 | 合入版本 | 对应 PR | | ----------------------- | -------- | ---------------------------------------------------------- | | 增加了可观测性相关的 Metrics 指标 | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | + +### 报错”failed to get latest offset“ +**问题描述**:Routine Load 无法获取 Kafka 最新的 Offset。 + +**常见原因**: +- 一般都是到kafka的网络不通, ping或者telnet kafka的域名确认下 +- 三方库的bug导致的获取超时,错误为:java.util.concurrent.TimeoutException: Waited X seconds + +### 报错”failed to get partition meta: Local:'Broker transport failure“ +**问题描述**:Routine Load 无法获取 Kafka Topic 的 Partition Meta。 + +**常见原因**: +- 一般都是到kafka的网络不通, ping或者telnet kafka的域名确认下 +- 如果使用的是域名的方式,可以在/etc/hosts 配置域名映射 + +### 报错“Broker: Offset out of range” +**问题描述**:消费的 offset 在 kafka 中不存在,可能是因为该 offset 已经被 kafka 清理掉了。 + +**解决方案**: +- 需要重新指定 offset 进行消费,例如可以指定 offset 为 OFFSET_BEGINNING。 +- 需要根据导入速度设置合理的 kafka log清理参数:log.retention.hours、log.retention.bytes等。 \ No newline at end of file diff --git a/sidebars.json b/sidebars.json index 87a9b6b274a..6ceb86500ed 100644 --- a/sidebars.json +++ b/sidebars.json @@ -828,7 +828,7 @@ "faq/lakehouse-faq", "faq/bi-faq", "faq/correctness-faq", - "faq/routineload-faq" + "faq/load-faq" ] }, { diff --git a/versioned_docs/version-2.1/faq/load-faq.md b/versioned_docs/version-2.1/faq/load-faq.md new file mode 100644 index 00000000000..c9fe8f19ef3 --- /dev/null +++ b/versioned_docs/version-2.1/faq/load-faq.md @@ -0,0 +1,140 @@ +--- +{ + "title": "Load FAQ", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +## General Load FAQ + +### Error "[DATA_QUALITY_ERROR] Encountered unqualified data" +**Problem Description**: Data quality error during loading. + +**Solution**: +- Stream Load and Insert Into operations will return an error URL, while for Broker Load you can check the error URL through the `Show Load` command. +- Use a browser or curl command to access the error URL to view the specific data quality error reasons. +- Use the strict_mode and max_filter_ratio parameters to control the acceptable error rate. + +### Error "[E-235] Failed to init rowset builder" +**Problem Description**: Error -235 occurs when the load frequency is too high and data hasn't been compacted in time, exceeding version limits. + +**Solution**: +- Increase the batch size of data loading and reduce loading frequency. +- Increase the `max_tablet_version_num` parameter in `be.conf`, it is recommended not to exceed 5000. + +### Error "[E-238] Too many segments in rowset" +**Problem Description**: Error -238 occurs when the number of segments under a single rowset exceeds the limit. + +**Common Causes**: +- The bucket number configured during table creation is too small. +- Data skew occurs; consider using more balanced bucket keys. + +### Error "Transaction commit successfully, BUT data will be visible later" +**Problem Description**: Data load is successful but temporarily not visible. + +**Cause**: Usually due to transaction publish delay caused by system resource pressure. + +### Error "Failed to commit kv txn [...] Transaction exceeds byte limit" +**Problem Description**: In shared-nothing mode, too many partitions and tablets are involved in a single load, exceeding the transaction size limit. + +**Solution**: +- Load data by partition in batches to reduce the number of partitions involved in a single load. +- Optimize table structure to reduce the number of partitions and tablets. + +### Extra "\r" in the last column of CSV file +**Problem Description**: Usually caused by Windows line endings. + +**Solution**: +Specify the correct line delimiter: `-H "line_delimiter:\r\n"` + +### CSV data with quotes imported as null +**Problem Description**: CSV data with quotes becomes null after import. + +**Solution**: +Use the `trim_double_quotes` parameter to remove double quotes around fields. + +## Stream Load + +### Reasons for Slow Loading +- Bottlenecks in CPU, IO, memory, or network card resources. +- Slow network between client machine and BE machines, can be initially diagnosed through ping latency from client to BE machines. +- Webserver thread count bottleneck, too many concurrent Stream Loads on a single BE (exceeding be.conf webserver_num_workers configuration) may cause thread count bottleneck. +- Memtable Flush thread count bottleneck, check BE metrics doris_be_flush_thread_pool_queue_size to see if queuing is severe. Can be resolved by increasing the be.conf flush_thread_num_per_store parameter. + +### Handling Special Characters in Column Names +When column names contain special characters, use single quotes with backticks to specify the columns parameter: +```shell +curl --location-trusted -u root:"" \ + -H 'columns:`@coltime`,colint,colvar' \ + -T a.csv \ + -H "column_separator:," \ + http://127.0.0.1:8030/api/db/loadtest/_stream_load +``` + +## Routine Load + +### Major Bug Fixes + +| Issue Description | Trigger Conditions | Impact Scope | Temporary Solution | Affected Versions | Fixed Versions | Fix PR | +|------------------|-------------------|--------------|-------------------|------------------|----------------|---------| +| When at least one job times out while connecting to Kafka, it affects the import of other jobs, slowing down global Routine Load imports. | At least one job times out while connecting to Kafka. | Shared-nothing and shared-storage | Stop or manually pause the job to resolve the issue. | <2.1.9 <3.0.5 | 2.1.9 3.0.5 | [#47530](https://github.com/apache/doris/pull/47530) | +| User data may be lost after restarting the FE Master. | The job's offset is set to OFFSET_END, and the FE is restarted. | Shared-storage | Change the consumption mode to OFFSET_BEGINNING. | 3.0.2-3.0.4 | 3.0.5 | [#46149](https://github.com/apache/doris/pull/46149) | +| A large number of small transactions are generated during import, causing compaction to fail and resulting in continuous -235 errors. | Doris consumes data too quickly, or Kafka data flow is in small batches. | Shared-nothing and shared-storage | Pause the Routine Load job and execute the following command: `ALTER ROUTINE LOAD FOR jobname FROM kafka ("property.enable.partition.eof" = "false");` | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#45528](https://github.com/apache/doris/pull/45528), [#4494 [...] +| Kafka third-party library destructor hangs, causing data consumption to fail. | Kafka topic deletion (possibly other conditions). | Shared-nothing and shared-storage | Restart all BE nodes. | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#44913](https://github.com/apache/doris/pull/44913) | +| Routine Load scheduling hangs. | Timeout occurs when FE aborts a transaction in Meta Service. | Shared-storage | Restart the FE node. | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | +| Routine Load restart issue. | Restarting BE nodes. | Shared-nothing and shared-storage | Manually resume the job. | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | + +### Default Configuration Optimizations + +| Optimization Content | Applied Versions | Corresponding PR | +|---------------------|------------------|------------------| +| Increased the timeout duration for Routine Load. | 2.1.7 3.0.3 | [#42042](https://github.com/apache/doris/pull/42042), [#40818](https://github.com/apache/doris/pull/40818) | +| Adjusted the default value of `max_batch_interval`. | 2.1.8 3.0.3 | [#42491](https://github.com/apache/doris/pull/42491) | +| Removed the restriction on `max_batch_interval`. | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | +| Adjusted the default values of `max_batch_rows` and `max_batch_size`. | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | + +### Observability Optimizations + +| Optimization Content | Applied Versions | Corresponding PR | +|---------------------|------------------|------------------| +| Added observability-related metrics. | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | + +### Error "failed to get latest offset" +**Problem Description**: Routine Load cannot get the latest Kafka offset. + +**Common Causes**: +- Usually due to network connectivity issues with Kafka. Verify by pinging or using telnet to test the Kafka domain name. +- Timeout caused by third-party library bug, error: java.util.concurrent.TimeoutException: Waited X seconds + +### Error "failed to get partition meta: Local:'Broker transport failure" +**Problem Description**: Routine Load cannot get Kafka Topic Partition Meta. + +**Common Causes**: +- Usually due to network connectivity issues with Kafka. Verify by pinging or using telnet to test the Kafka domain name. +- If using domain names, try configuring domain name mapping in /etc/hosts + +### Error "Broker: Offset out of range" +**Problem Description**: The consumed offset doesn't exist in Kafka, possibly because it has been cleaned up by Kafka. + +**Solution**: +- Need to specify a new offset for consumption, for example, set offset to OFFSET_BEGINNING. +- Need to set appropriate Kafka log cleanup parameters based on import speed: log.retention.hours, log.retention.bytes, etc. \ No newline at end of file diff --git a/versioned_docs/version-2.1/faq/routineload-faq.md b/versioned_docs/version-2.1/faq/routineload-faq.md deleted file mode 100644 index 3960f67bfe8..00000000000 --- a/versioned_docs/version-2.1/faq/routineload-faq.md +++ /dev/null @@ -1,55 +0,0 @@ ---- -{ - "title": "Routine Load FAQ", - "language": "en" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -# Routine Load FAQ - -This document records common issues, bug fixes, and optimization improvements related to Routine Load in Doris. It will be updated periodically. - -## Major Bug Fixes - -| Issue Description | Trigger Conditions | Impact Scope | Temporary Solution | Affected Versions | Fixed Versions | Fix PR | -| ----------------------------------------------------------- | ------------------------------------------- | ----------------- | ---------------------------------------------------------- | ----------------- | -------------- | ---------------------------------------------------------- | -| When at least one job times out while connecting to Kafka, it affects the import of other jobs, slowing down global Routine Load imports. | At least one job times out while connecting to Kafka. | Shared-nothing and shared-storage | Stop or manually pause the job to resolve the issue. | <2.1.9 <3.0.5 | 2.1.9 3.0.5 | [#47530](https://github.com/apache/doris/pull/47530) | -| User data may be lost after restarting the FE Master. | The job's offset is set to OFFSET_END, and the FE is restarted. | Shared-storage | Change the consumption mode to OFFSET_BEGINNING. | 3.0.2-3.0.4 | 3.0.5 | [#46149](https://github.com/apache/doris/pull/46149) | -| A large number of small transactions are generated during import, causing compaction to fail and resulting in continuous -235 errors. | Doris consumes data too quickly, or Kafka data flow is in small batches. | Shared-nothing and shared-storage | Pause the Routine Load job and execute the following command: `ALTER ROUTINE LOAD FOR jobname FROM kafka ("property.enable.partition.eof" = "false");` | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#45528](https://github.com/apache/doris/pull/45528), [ [...] -| Kafka third-party library destructor hangs, causing data consumption to fail. | Kafka topic deletion (possibly other conditions). | Shared-nothing and shared-storage | Restart all BE nodes. | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#44913](https://github.com/apache/doris/pull/44913) | -| Routine Load scheduling hangs. | Timeout occurs when FE aborts a transaction in Meta Service. | Shared-storage | Restart the FE node. | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | -| Routine Load restart issue. | Restarting BE nodes. | Shared-nothing and shared-storage | Manually resume the job. | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | - -## Default Configuration Optimizations - -| Optimization Content | Applied Versions | Corresponding PR | -| ------------------------------------------- | ---------------- | ---------------------------------------------------------- | -| Increased the timeout duration for Routine Load. | 2.1.7 3.0.3 | [#42042](https://github.com/apache/doris/pull/42042), [#40818](https://github.com/apache/doris/pull/40818) | -| Adjusted the default value of `max_batch_interval`. | 2.1.8 3.0.3 | [#42491](https://github.com/apache/doris/pull/42491) | -| Removed the restriction on `max_batch_interval`. | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | -| Adjusted the default values of `max_batch_rows` and `max_batch_size`. | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | - -## Observability Optimizations - -| Optimization Content | Applied Versions | Corresponding PR | -| ---------------------------- | ---------------- | ---------------------------------------------------------- | -| Added observability-related metrics. | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | diff --git a/versioned_docs/version-3.0/faq/load-faq.md b/versioned_docs/version-3.0/faq/load-faq.md new file mode 100644 index 00000000000..c9fe8f19ef3 --- /dev/null +++ b/versioned_docs/version-3.0/faq/load-faq.md @@ -0,0 +1,140 @@ +--- +{ + "title": "Load FAQ", + "language": "en" +} +--- + +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +## General Load FAQ + +### Error "[DATA_QUALITY_ERROR] Encountered unqualified data" +**Problem Description**: Data quality error during loading. + +**Solution**: +- Stream Load and Insert Into operations will return an error URL, while for Broker Load you can check the error URL through the `Show Load` command. +- Use a browser or curl command to access the error URL to view the specific data quality error reasons. +- Use the strict_mode and max_filter_ratio parameters to control the acceptable error rate. + +### Error "[E-235] Failed to init rowset builder" +**Problem Description**: Error -235 occurs when the load frequency is too high and data hasn't been compacted in time, exceeding version limits. + +**Solution**: +- Increase the batch size of data loading and reduce loading frequency. +- Increase the `max_tablet_version_num` parameter in `be.conf`, it is recommended not to exceed 5000. + +### Error "[E-238] Too many segments in rowset" +**Problem Description**: Error -238 occurs when the number of segments under a single rowset exceeds the limit. + +**Common Causes**: +- The bucket number configured during table creation is too small. +- Data skew occurs; consider using more balanced bucket keys. + +### Error "Transaction commit successfully, BUT data will be visible later" +**Problem Description**: Data load is successful but temporarily not visible. + +**Cause**: Usually due to transaction publish delay caused by system resource pressure. + +### Error "Failed to commit kv txn [...] Transaction exceeds byte limit" +**Problem Description**: In shared-nothing mode, too many partitions and tablets are involved in a single load, exceeding the transaction size limit. + +**Solution**: +- Load data by partition in batches to reduce the number of partitions involved in a single load. +- Optimize table structure to reduce the number of partitions and tablets. + +### Extra "\r" in the last column of CSV file +**Problem Description**: Usually caused by Windows line endings. + +**Solution**: +Specify the correct line delimiter: `-H "line_delimiter:\r\n"` + +### CSV data with quotes imported as null +**Problem Description**: CSV data with quotes becomes null after import. + +**Solution**: +Use the `trim_double_quotes` parameter to remove double quotes around fields. + +## Stream Load + +### Reasons for Slow Loading +- Bottlenecks in CPU, IO, memory, or network card resources. +- Slow network between client machine and BE machines, can be initially diagnosed through ping latency from client to BE machines. +- Webserver thread count bottleneck, too many concurrent Stream Loads on a single BE (exceeding be.conf webserver_num_workers configuration) may cause thread count bottleneck. +- Memtable Flush thread count bottleneck, check BE metrics doris_be_flush_thread_pool_queue_size to see if queuing is severe. Can be resolved by increasing the be.conf flush_thread_num_per_store parameter. + +### Handling Special Characters in Column Names +When column names contain special characters, use single quotes with backticks to specify the columns parameter: +```shell +curl --location-trusted -u root:"" \ + -H 'columns:`@coltime`,colint,colvar' \ + -T a.csv \ + -H "column_separator:," \ + http://127.0.0.1:8030/api/db/loadtest/_stream_load +``` + +## Routine Load + +### Major Bug Fixes + +| Issue Description | Trigger Conditions | Impact Scope | Temporary Solution | Affected Versions | Fixed Versions | Fix PR | +|------------------|-------------------|--------------|-------------------|------------------|----------------|---------| +| When at least one job times out while connecting to Kafka, it affects the import of other jobs, slowing down global Routine Load imports. | At least one job times out while connecting to Kafka. | Shared-nothing and shared-storage | Stop or manually pause the job to resolve the issue. | <2.1.9 <3.0.5 | 2.1.9 3.0.5 | [#47530](https://github.com/apache/doris/pull/47530) | +| User data may be lost after restarting the FE Master. | The job's offset is set to OFFSET_END, and the FE is restarted. | Shared-storage | Change the consumption mode to OFFSET_BEGINNING. | 3.0.2-3.0.4 | 3.0.5 | [#46149](https://github.com/apache/doris/pull/46149) | +| A large number of small transactions are generated during import, causing compaction to fail and resulting in continuous -235 errors. | Doris consumes data too quickly, or Kafka data flow is in small batches. | Shared-nothing and shared-storage | Pause the Routine Load job and execute the following command: `ALTER ROUTINE LOAD FOR jobname FROM kafka ("property.enable.partition.eof" = "false");` | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#45528](https://github.com/apache/doris/pull/45528), [#4494 [...] +| Kafka third-party library destructor hangs, causing data consumption to fail. | Kafka topic deletion (possibly other conditions). | Shared-nothing and shared-storage | Restart all BE nodes. | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#44913](https://github.com/apache/doris/pull/44913) | +| Routine Load scheduling hangs. | Timeout occurs when FE aborts a transaction in Meta Service. | Shared-storage | Restart the FE node. | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | +| Routine Load restart issue. | Restarting BE nodes. | Shared-nothing and shared-storage | Manually resume the job. | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | + +### Default Configuration Optimizations + +| Optimization Content | Applied Versions | Corresponding PR | +|---------------------|------------------|------------------| +| Increased the timeout duration for Routine Load. | 2.1.7 3.0.3 | [#42042](https://github.com/apache/doris/pull/42042), [#40818](https://github.com/apache/doris/pull/40818) | +| Adjusted the default value of `max_batch_interval`. | 2.1.8 3.0.3 | [#42491](https://github.com/apache/doris/pull/42491) | +| Removed the restriction on `max_batch_interval`. | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | +| Adjusted the default values of `max_batch_rows` and `max_batch_size`. | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | + +### Observability Optimizations + +| Optimization Content | Applied Versions | Corresponding PR | +|---------------------|------------------|------------------| +| Added observability-related metrics. | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | + +### Error "failed to get latest offset" +**Problem Description**: Routine Load cannot get the latest Kafka offset. + +**Common Causes**: +- Usually due to network connectivity issues with Kafka. Verify by pinging or using telnet to test the Kafka domain name. +- Timeout caused by third-party library bug, error: java.util.concurrent.TimeoutException: Waited X seconds + +### Error "failed to get partition meta: Local:'Broker transport failure" +**Problem Description**: Routine Load cannot get Kafka Topic Partition Meta. + +**Common Causes**: +- Usually due to network connectivity issues with Kafka. Verify by pinging or using telnet to test the Kafka domain name. +- If using domain names, try configuring domain name mapping in /etc/hosts + +### Error "Broker: Offset out of range" +**Problem Description**: The consumed offset doesn't exist in Kafka, possibly because it has been cleaned up by Kafka. + +**Solution**: +- Need to specify a new offset for consumption, for example, set offset to OFFSET_BEGINNING. +- Need to set appropriate Kafka log cleanup parameters based on import speed: log.retention.hours, log.retention.bytes, etc. \ No newline at end of file diff --git a/versioned_docs/version-3.0/faq/routineload-faq.md b/versioned_docs/version-3.0/faq/routineload-faq.md deleted file mode 100644 index 3960f67bfe8..00000000000 --- a/versioned_docs/version-3.0/faq/routineload-faq.md +++ /dev/null @@ -1,55 +0,0 @@ ---- -{ - "title": "Routine Load FAQ", - "language": "en" -} ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. ---> - -# Routine Load FAQ - -This document records common issues, bug fixes, and optimization improvements related to Routine Load in Doris. It will be updated periodically. - -## Major Bug Fixes - -| Issue Description | Trigger Conditions | Impact Scope | Temporary Solution | Affected Versions | Fixed Versions | Fix PR | -| ----------------------------------------------------------- | ------------------------------------------- | ----------------- | ---------------------------------------------------------- | ----------------- | -------------- | ---------------------------------------------------------- | -| When at least one job times out while connecting to Kafka, it affects the import of other jobs, slowing down global Routine Load imports. | At least one job times out while connecting to Kafka. | Shared-nothing and shared-storage | Stop or manually pause the job to resolve the issue. | <2.1.9 <3.0.5 | 2.1.9 3.0.5 | [#47530](https://github.com/apache/doris/pull/47530) | -| User data may be lost after restarting the FE Master. | The job's offset is set to OFFSET_END, and the FE is restarted. | Shared-storage | Change the consumption mode to OFFSET_BEGINNING. | 3.0.2-3.0.4 | 3.0.5 | [#46149](https://github.com/apache/doris/pull/46149) | -| A large number of small transactions are generated during import, causing compaction to fail and resulting in continuous -235 errors. | Doris consumes data too quickly, or Kafka data flow is in small batches. | Shared-nothing and shared-storage | Pause the Routine Load job and execute the following command: `ALTER ROUTINE LOAD FOR jobname FROM kafka ("property.enable.partition.eof" = "false");` | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#45528](https://github.com/apache/doris/pull/45528), [ [...] -| Kafka third-party library destructor hangs, causing data consumption to fail. | Kafka topic deletion (possibly other conditions). | Shared-nothing and shared-storage | Restart all BE nodes. | <2.1.8 <3.0.4 | 2.1.8 3.0.4 | [#44913](https://github.com/apache/doris/pull/44913) | -| Routine Load scheduling hangs. | Timeout occurs when FE aborts a transaction in Meta Service. | Shared-storage | Restart the FE node. | <3.0.2 | 3.0.2 | [#41267](https://github.com/apache/doris/pull/41267) | -| Routine Load restart issue. | Restarting BE nodes. | Shared-nothing and shared-storage | Manually resume the job. | <2.1.7 <3.0.2 | 2.1.7 3.0.2 | [#3727](https://github.com/selectdb/selectdb-core/pull/3727) | - -## Default Configuration Optimizations - -| Optimization Content | Applied Versions | Corresponding PR | -| ------------------------------------------- | ---------------- | ---------------------------------------------------------- | -| Increased the timeout duration for Routine Load. | 2.1.7 3.0.3 | [#42042](https://github.com/apache/doris/pull/42042), [#40818](https://github.com/apache/doris/pull/40818) | -| Adjusted the default value of `max_batch_interval`. | 2.1.8 3.0.3 | [#42491](https://github.com/apache/doris/pull/42491) | -| Removed the restriction on `max_batch_interval`. | 2.1.5 3.0.0 | [#29071](https://github.com/apache/doris/pull/29071) | -| Adjusted the default values of `max_batch_rows` and `max_batch_size`. | 2.1.5 3.0.0 | [#36632](https://github.com/apache/doris/pull/36632) | - -## Observability Optimizations - -| Optimization Content | Applied Versions | Corresponding PR | -| ---------------------------- | ---------------- | ---------------------------------------------------------- | -| Added observability-related metrics. | 3.0.5 | [#48209](https://github.com/apache/doris/pull/48209), [#48171](https://github.com/apache/doris/pull/48171), [#48963](https://github.com/apache/doris/pull/48963) | diff --git a/versioned_sidebars/version-2.1-sidebars.json b/versioned_sidebars/version-2.1-sidebars.json index 83ac7c6ec6d..0a7ede3136a 100644 --- a/versioned_sidebars/version-2.1-sidebars.json +++ b/versioned_sidebars/version-2.1-sidebars.json @@ -832,7 +832,7 @@ "faq/lakehouse-faq", "faq/bi-faq", "faq/correctness-faq", - "faq/routineload-faq" + "faq/load-faq" ] }, { diff --git a/versioned_sidebars/version-3.0-sidebars.json b/versioned_sidebars/version-3.0-sidebars.json index 4d80dc9d719..ed9fbeb39f9 100644 --- a/versioned_sidebars/version-3.0-sidebars.json +++ b/versioned_sidebars/version-3.0-sidebars.json @@ -885,7 +885,7 @@ "faq/lakehouse-faq", "faq/bi-faq", "faq/correctness-faq", - "faq/routineload-faq" + "faq/load-faq" ] }, { --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org