(doris-website) branch master updated: [doc](load) add bigquery datasource doc (#2318)

dataroaring Thu, 24 Apr 2025 05:22:47 -0700

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 2f452f82384 [doc](load) add bigquery datasource doc (#2318)
2f452f82384 is described below

commit 2f452f82384bee1330f8cf611b64662b6c30aea2
Author: wudi <w...@selectdb.com>
AuthorDate: Thu Apr 24 19:58:38 2025 +0800

    [doc](load) add bigquery datasource doc (#2318)
    
    ## Versions
    
    - [x] dev
    - [x] 3.0
    - [x] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/data-operate/import/data-source/bigquery.md   | 220 +++++++++++++++++++++
 .../data-operate/import/data-source/bigquery.md    | 218 ++++++++++++++++++++
 .../data-operate/import/data-source/bigquery.md    | 218 ++++++++++++++++++++
 .../data-operate/import/data-source/bigquery.md    | 218 ++++++++++++++++++++
 sidebars.json                                      |   1 +
 static/images/data-operate/gcs_export.png          | Bin 0 -> 159797 bytes
 .../data-operate/import/data-source/bigquery.md    | 220 +++++++++++++++++++++
 .../data-operate/import/data-source/bigquery.md    | 220 +++++++++++++++++++++
 versioned_sidebars/version-2.1-sidebars.json       |   1 +
 versioned_sidebars/version-3.0-sidebars.json       |   1 +
 10 files changed, 1317 insertions(+)

diff --git a/docs/data-operate/import/data-source/bigquery.md 
b/docs/data-operate/import/data-source/bigquery.md
new file mode 100644
index 00000000000..417535e3902
--- /dev/null
+++ b/docs/data-operate/import/data-source/bigquery.md
@@ -0,0 +1,220 @@
+---
+{
+    "title": "BigQuery",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+During the process of migrating BigQuery, it is usually necessary to use 
object storage as an intermediate medium. The core process is as follows: 
First, use BigQuery's 
[Export](https://cloud.google.com/bigquery/docs/exporting-data) statement to 
export data to GCS (Google Cloud Storage); then, use Doris's S3 Load function 
to read data from the object storage and load it into Doris. For details, 
please refer to [S3 Load](./amazon-s3.md).
+
+
+## Considerations
+
+1. Before the migration, it is necessary to select Doris' [Data 
Model](../../../table-design/data-model/overview.md), as well as the strategies 
for 
[Partitioning](../../../table-design/data-partitioning/dynamic-partitioning.md) 
and [Bucketing](../../../table-design/data-partitioning/data-bucketing.md) 
according to the table structure of BigQuery. For more table creation 
strategies, please refer to [Load Best Practices](../load-best-practices.md).
+2. When BigQuery exports data in JSON type, it does not support exporting in 
Parquet format. You can export it in JSON format instead.
+3. When BigQuery exports data of the Time type, it is necessary to export it 
after casting it to the String type.
+
+## Data type mapping
+
+| BigQuery           | Doris          | Comment                 |
+| ------------------ | -------------- | -------------------- |
+| Array              | Array          |                      |
+| BOOLEAN            | BOOLEAN        |                      |
+| DATE               | DATE           |                      |
+| DATETIME/TIMESTAMP | DATETIME       |                      |
+| JSON               | JSON           |                      |
+| INT64              | BIGINT         |                      |
+| NUMERIC            | DECIMAL        |                      |
+| FLOAT64            | DOUBLE         |                      |
+| STRING             | VARCHAR/STRING | VARCHAR maximum length is 65535 |
+| STRUCT             | STRUCT         |                      |
+| TIME               | STRING         |                      |
+| OTHER              | UNSUPPORTED    |                      |
+
+## 1. Create Table
+
+When migrating a BigQuery table to Doris, it is necessary to create a Doris 
table first.
+
+Suppose we already have the following table and data in BigQuery.
+
+```SQL
+CREATE OR REPLACE TABLE test.sales_data (
+    order_id      INT64,
+    customer_name STRING,
+    order_date    DATE,
+    amount        NUMERIC(10,2),
+    country       STRING
+)
+PARTITION BY  order_date
+
+
+INSERT INTO test.sales_data (order_id, customer_name, order_date, amount, 
country) VALUES
+(1, 'Alice', '2025-04-08', 99.99, 'USA'),
+(2, 'Bob', '2025-04-08', 149.50, 'Canada'),
+(3, 'Charlie', '2025-04-09', 75.00, 'UK'),
+(4, 'Diana', '2025-04-10', 200.00, 'Australia');
+```
+
+According to this table structure, a Doris primary key partitioned table can 
be created. The partition field should be the same as that in BigQuery, and the 
table should be partitioned on a daily basis.
+
+```sql
+CREATE TABLE `sales_data` (
+  order_id      INT,
+  order_date    DATE NOT NULL,
+  customer_name VARCHAR(128),
+  amount        DECIMAL(10,2),
+  country       VARCHAR(48)
+) ENGINE=OLAP
+UNIQUE KEY(`order_id`,`order_date`)
+PARTITION BY RANGE(`order_date`) ()
+DISTRIBUTED BY HASH(`order_id`) BUCKETS 16
+PROPERTIES (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "DAY",
+ "dynamic_partition.start" = "-10",
+ "dynamic_partition.end" = "10",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "16",
+ "replication_num" = "1"
+);
+```
+
+## 2. Export BigQuery Data
+
+1. **Export to GCS Parquet format file through Export method**
+  ```sql
+    EXPORT DATA
+      OPTIONS (
+        uri = 'gs://mybucket/export/sales_data/*.parquet',
+        format = 'PARQUET')
+    AS (
+      SELECT *
+      FROM test.sales_data 
+    );
+  ```
+2. **View the exported files on GCS**
+  The above command will export the data of sales_data to GCS, and each 
partition will generate one or more files with increasing file names. For 
details, please refer to 
[exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files),
 as follows:
+  ![img](/images/data-operate/gcs_export.png)
+
+
+## 3. Load Data to Doris
+
+S3 Load is an asynchronous data load method. After execution, Doris actively 
pulls data from the data source. The data source supports object storage 
compatible with the S3 protocol, including ([AWS 
S3](./amazon-s3.md)，[GCS](./google-cloud-storage.md)，[AZURE](./azure-storage.md)，etc)
+
+This method is suitable for scenarios involving large volumes of data that 
require asynchronous processing in the background. For data imports that need 
to be handled synchronously, refer to  [TVF Load](./amazon-s3.md#load-with-tvf)。
+
+*Note: For **Parquet/ORC format files that contain complex types 
(Struct/Array/Map)**, TVF Load must be used.*
+
+1. **Loading data from a single file**
+
+  ```sql
+  LOAD LABEL sales_data_2025_04_08
+  (
+      DATA INFILE("s3://mybucket/export/sales_data/000000000000.parquet")
+      INTO TABLE sales_data
+      FORMAT AS "parquet"
+      (order_id, order_date, customer_name, amount, country)
+  )
+  WITH S3
+  (
+      "provider" = "GCP",
+      "s3.endpoint" = "storage.asia-southeast1.rep.googleapis.com",  
+      "s3.region" = "asia-southeast1",
+      "s3.access_key" = "<ak>",
+      "s3.secret_key" = "<sk>"
+  );
+  ```
+
+2. **Check Load Status via SHOW LOAD**
+
+  Since S3 Load import is submitted asynchronously, you can check the status 
of a specific label using SHOW LOAD:
+
+  ```yaml
+  mysql> show load where label = "label_sales_data_2025_04_08"\G
+  *************************** 1. row ***************************
+          JobId: 17956078
+          Label: label_sales_data_2025_04_08
+          State: FINISHED
+        Progress: 100.00% (1/1)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):3600; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: NULL
+      CreateTime: 2025-04-10 17:50:53
+    EtlStartTime: 2025-04-10 17:50:54
+  EtlFinishTime: 2025-04-10 17:50:54
+  LoadStartTime: 2025-04-10 17:50:54
+  LoadFinishTime: 2025-04-10 17:50:54
+            URL: NULL
+      JobDetails: {"Unfinished 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+  TransactionId: 766228
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  1 row in set (0.00 sec)
+  ```
+
+3. **Handle Load Errors**
+
+  When there are multiple load tasks, you can use the following statement to 
query the dates and reasons for data load failures.
+
+  ```yaml
+  mysql> show load where state='CANCELLED' and label like "label_test%"\G
+  *************************** 1. row ***************************
+          JobId: 18312384
+          Label: label_test123
+          State: CANCELLED
+        Progress: 100.00% (3/3)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):14400; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to 
cancel
+      CreateTime: 2025-04-15 17:32:59
+    EtlStartTime: 2025-04-15 17:33:02
+  EtlFinishTime: 2025-04-15 17:33:02
+  LoadStartTime: 2025-04-15 17:33:02
+  LoadFinishTime: 2025-04-15 17:33:02
+            URL: 
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+      JobDetails: {"Unfinished 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+  TransactionId: 769213
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  ```
+
+  As shown in the example above, the issue is a **data quality 
error**(ETL_QUALITY_UNSATISFIED). To view the detailed error, you need to visit 
the URL provided in the result. For example, the data exceeded the defined 
length of the country column in the table schema:
+
+  ```python
+  [root@VM-10-6-centos ~]$ curl 
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342";
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; . 
src line [];
+  ```
+
+  For data quality errors, if you want to allow skipping erroneous records, 
you can set a fault tolerance rate in the Properties section of the S3 Load 
task. For details, refer to [Load Configuration 
Parameters](../../import/import-way/broker-load-manual.md#related-configurations)。
+  
+1. **Loading data from multiple files**
+
+   When migrating a large volume of historical data, it is recommended to use 
a batch load strategy. Each batch corresponds to one or a few partitions in 
Doris. It is recommended to keep the data size under 100GB per batch to reduce 
system load and lower the cost of retries in case of load failures.
+
+   You can refer to the script 
[s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh),
 which can split the file list under the specified directory on the object 
storage and submit multiple S3 Load tasks to Doris in batches to achieve the 
effect of batch load.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/bigquery.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/bigquery.md
new file mode 100644
index 00000000000..853521633cf
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/bigquery.md
@@ -0,0 +1,218 @@
+---
+{
+    "title": "BigQuery",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+在迁移 BigQuery 的过程中，通常需要借助对象存储作为中间媒介。核心流程如下：首先通过 BigQuery 的 
[Export](https://cloud.google.com/bigquery/docs/exporting-data) 语句将数据导出到 
GCS（Google Cloud Storage）；再利用 Doris 的 S3 Load 功能从对象存储中读取数据并导入到 Doris 中，具体可参考 
[S3 导入](./amazon-s3.md)。
+
+## 注意事项
+
+1. 在迁移之前，需要根据 BigQuery 的表结构选择 Doris 
的[数据模型](../../../table-design/data-model/overview.md)，以及[分区](../../../table-design/data-partitioning/dynamic-partitioning.md)和[分桶](../../../table-design/data-partitioning/data-bucketing.md)的策略，更多创建表策略可参考[导入最佳实践](../load-best-practices.md)。
+2. BigQuery 导出 JSON 类型时，不支持 Parquet 格式导出，可使用 JSON 格式导出。
+3. BigQuery 导出 Time 类型时，需要 Cast String 类型导出。
+
+## 数据类型映射
+
+| BigQuery           | Doris          | 备注                 |
+| ------------------ | -------------- | -------------------- |
+| Array              | Array          |                      |
+| BOOLEAN            | BOOLEAN        |                      |
+| DATE               | DATE           |                      |
+| DATETIME/TIMESTAMP | DATETIME       |                      |
+| JSON               | JSON           |                      |
+| INT64              | BIGINT         |                      |
+| NUMERIC            | DECIMAL        |                      |
+| FLOAT64            | DOUBLE         |                      |
+| STRING             | VARCHAR/STRING | VARCHAR 长度最大 65535 |
+| STRUCT             | STRUCT         |                      |
+| TIME               | STRING         |                      |
+| OTHER              | UNSUPPORTED    |                      |
+
+## 1. 创建表
+
+在迁移 BigQuery 表到 Doris 中的时候，需要先创建 Doris 表。
+
+假设我们在 BigQuery 中已存在如下表和数据
+
+```SQL
+CREATE OR REPLACE TABLE test.sales_data (
+    order_id      INT64,
+    customer_name STRING,
+    order_date    DATE,
+    amount        NUMERIC(10,2),
+    country       STRING
+)
+PARTITION BY  order_date
+
+
+INSERT INTO test.sales_data (order_id, customer_name, order_date, amount, 
country) VALUES
+(1, 'Alice', '2025-04-08', 99.99, 'USA'),
+(2, 'Bob', '2025-04-08', 149.50, 'Canada'),
+(3, 'Charlie', '2025-04-09', 75.00, 'UK'),
+(4, 'Diana', '2025-04-10', 200.00, 'Australia');
+```
+
+根据这个表结构，可以创建 Doris 主键分区表，分区字段和 Bigquery 的分区字段一致，同时按天分区
+
+```sql
+CREATE TABLE `sales_data` (
+  order_id      INT,
+  order_date    DATE NOT NULL,
+  customer_name VARCHAR(128),
+  amount        DECIMAL(10,2),
+  country       VARCHAR(48)
+) ENGINE=OLAP
+UNIQUE KEY(`order_id`,`order_date`)
+PARTITION BY RANGE(`order_date`) ()
+DISTRIBUTED BY HASH(`order_id`) BUCKETS 16
+PROPERTIES (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "DAY",
+ "dynamic_partition.start" = "-10",
+ "dynamic_partition.end" = "10",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "16",
+ "replication_num" = "1"
+);
+```
+
+## 2. 导出 BigQuery 数据
+
+1. **通过 Export 方式导出到 GCS Parquet 格式的文件**
+  ```sql
+    EXPORT DATA
+      OPTIONS (
+        uri = 'gs://mybucket/export/sales_data/*.parquet',
+        format = 'PARQUET')
+    AS (
+      SELECT *
+      FROM test.sales_data 
+    );
+  ```
+2. **查看 GCS 上的导出文件**
+  以上命令会将 sales_data 的数据导出到 GCS 
上，并且每个分区会产生一个或多个文件，文件名递增，具体可参考[exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files)，如下
+  ![img](/images/data-operate/gcs_export.png)
+
+## 3. 导入数据到 Doris
+
+导入使用 S3 Load 进行导入，**S3 Load 是一种异步的数据导入方式，执行后 Doris 会主动从数据源拉取数据**，数据源支持兼容 S3 
协议的对象存储，包括 ([AWS 
S3](./amazon-s3.md)，[GCS](./google-cloud-storage.md)，[AZURE](./azure-storage.md)等)。
+
+该方式适用于数据量大、需要后台异步处理的场景。对于需要同步处理的数据导入，可以参考 [TVF 
导入](./amazon-s3.md#load-with-tvf)。
+
+*注意：对于含有复杂类型（Struct/Array/Map）的 Parquet/ORC 格式文件导入，目前必须使用 TVF 导入*
+
+1. **导入单个文件的数据**
+
+  ```sql
+  LOAD LABEL sales_data_2025_04_08
+  (
+      DATA INFILE("s3://mybucket/export/sales_data/000000000000.parquet")
+      INTO TABLE sales_data
+      FORMAT AS "parquet"
+      (order_id, order_date, customer_name, amount, country)
+  )
+  WITH S3
+  (
+      "provider" = "GCP",
+      "s3.endpoint" = "storage.asia-southeast1.rep.googleapis.com",  
+      "s3.region" = "asia-southeast1",
+      "s3.access_key" = "<ak>",
+      "s3.secret_key" = "<sk>"
+  );
+  ```
+
+2. **通过 Show Load 查看任务运行情况**
+
+  由于 S3Load 导入是异步提交的，所以需要通过 show load 可以查看指定 label 的导入情况：
+
+  ```yaml
+  mysql> show load where label = "label_sales_data_2025_04_08"\G
+  *************************** 1. row ***************************
+          JobId: 17956078
+          Label: label_sales_data_2025_04_08
+          State: FINISHED
+        Progress: 100.00% (1/1)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):3600; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: NULL
+      CreateTime: 2025-04-10 17:50:53
+    EtlStartTime: 2025-04-10 17:50:54
+  EtlFinishTime: 2025-04-10 17:50:54
+  LoadStartTime: 2025-04-10 17:50:54
+  LoadFinishTime: 2025-04-10 17:50:54
+            URL: NULL
+      JobDetails: {"Unfinished 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+  TransactionId: 766228
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  1 row in set (0.00 sec)
+  ```
+
+3. **处理导入过程中的错误**
+
+  当有多个导入任务时，可以通过以下语句，查询数据导入失败的日期和原因。
+
+  ```yaml
+  mysql> show load where state='CANCELLED' and label like "label_test%"\G
+  *************************** 1. row ***************************
+          JobId: 18312384
+          Label: label_test123
+          State: CANCELLED
+        Progress: 100.00% (3/3)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):14400; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to 
cancel
+      CreateTime: 2025-04-15 17:32:59
+    EtlStartTime: 2025-04-15 17:33:02
+  EtlFinishTime: 2025-04-15 17:33:02
+  LoadStartTime: 2025-04-15 17:33:02
+  LoadFinishTime: 2025-04-15 17:33:02
+            URL: 
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+      JobDetails: {"Unfinished 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+  TransactionId: 769213
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  ```
+
+  如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED)，具体错误需要通过访问返回的 URL 
的链接进行查看，如下是数据超过了表中的 Schema 中 country 列的实际长度：
+
+  ```python
+  [root@VM-10-6-centos ~]$ curl 
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342";
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; . 
src line [];
+  ```
+
+  同时对于数据质量的错误，如果可以允许错误数据跳过的，可以通过在 S3 Load 任务中 Properties 
设置容错率，具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
+
+4. **导入多个文件的数据**
+
+  当需要迁移大数据量的存量数据时，建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区，数据量建议不超过 
100GB，以减轻系统压力并降低导入失败后的重试成本。
+
+  可参考脚本 
[s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh)，该脚本可以对对象存储上指定目录下的文件列表进行拆分，分批提交多个
 S3 Load 任务到 Doris 中，实现批量导入的效果。
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/bigquery.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/bigquery.md
new file mode 100644
index 00000000000..853521633cf
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/bigquery.md
@@ -0,0 +1,218 @@
+---
+{
+    "title": "BigQuery",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+在迁移 BigQuery 的过程中，通常需要借助对象存储作为中间媒介。核心流程如下：首先通过 BigQuery 的 
[Export](https://cloud.google.com/bigquery/docs/exporting-data) 语句将数据导出到 
GCS（Google Cloud Storage）；再利用 Doris 的 S3 Load 功能从对象存储中读取数据并导入到 Doris 中，具体可参考 
[S3 导入](./amazon-s3.md)。
+
+## 注意事项
+
+1. 在迁移之前，需要根据 BigQuery 的表结构选择 Doris 
的[数据模型](../../../table-design/data-model/overview.md)，以及[分区](../../../table-design/data-partitioning/dynamic-partitioning.md)和[分桶](../../../table-design/data-partitioning/data-bucketing.md)的策略，更多创建表策略可参考[导入最佳实践](../load-best-practices.md)。
+2. BigQuery 导出 JSON 类型时，不支持 Parquet 格式导出，可使用 JSON 格式导出。
+3. BigQuery 导出 Time 类型时，需要 Cast String 类型导出。
+
+## 数据类型映射
+
+| BigQuery           | Doris          | 备注                 |
+| ------------------ | -------------- | -------------------- |
+| Array              | Array          |                      |
+| BOOLEAN            | BOOLEAN        |                      |
+| DATE               | DATE           |                      |
+| DATETIME/TIMESTAMP | DATETIME       |                      |
+| JSON               | JSON           |                      |
+| INT64              | BIGINT         |                      |
+| NUMERIC            | DECIMAL        |                      |
+| FLOAT64            | DOUBLE         |                      |
+| STRING             | VARCHAR/STRING | VARCHAR 长度最大 65535 |
+| STRUCT             | STRUCT         |                      |
+| TIME               | STRING         |                      |
+| OTHER              | UNSUPPORTED    |                      |
+
+## 1. 创建表
+
+在迁移 BigQuery 表到 Doris 中的时候，需要先创建 Doris 表。
+
+假设我们在 BigQuery 中已存在如下表和数据
+
+```SQL
+CREATE OR REPLACE TABLE test.sales_data (
+    order_id      INT64,
+    customer_name STRING,
+    order_date    DATE,
+    amount        NUMERIC(10,2),
+    country       STRING
+)
+PARTITION BY  order_date
+
+
+INSERT INTO test.sales_data (order_id, customer_name, order_date, amount, 
country) VALUES
+(1, 'Alice', '2025-04-08', 99.99, 'USA'),
+(2, 'Bob', '2025-04-08', 149.50, 'Canada'),
+(3, 'Charlie', '2025-04-09', 75.00, 'UK'),
+(4, 'Diana', '2025-04-10', 200.00, 'Australia');
+```
+
+根据这个表结构，可以创建 Doris 主键分区表，分区字段和 Bigquery 的分区字段一致，同时按天分区
+
+```sql
+CREATE TABLE `sales_data` (
+  order_id      INT,
+  order_date    DATE NOT NULL,
+  customer_name VARCHAR(128),
+  amount        DECIMAL(10,2),
+  country       VARCHAR(48)
+) ENGINE=OLAP
+UNIQUE KEY(`order_id`,`order_date`)
+PARTITION BY RANGE(`order_date`) ()
+DISTRIBUTED BY HASH(`order_id`) BUCKETS 16
+PROPERTIES (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "DAY",
+ "dynamic_partition.start" = "-10",
+ "dynamic_partition.end" = "10",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "16",
+ "replication_num" = "1"
+);
+```
+
+## 2. 导出 BigQuery 数据
+
+1. **通过 Export 方式导出到 GCS Parquet 格式的文件**
+  ```sql
+    EXPORT DATA
+      OPTIONS (
+        uri = 'gs://mybucket/export/sales_data/*.parquet',
+        format = 'PARQUET')
+    AS (
+      SELECT *
+      FROM test.sales_data 
+    );
+  ```
+2. **查看 GCS 上的导出文件**
+  以上命令会将 sales_data 的数据导出到 GCS 
上，并且每个分区会产生一个或多个文件，文件名递增，具体可参考[exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files)，如下
+  ![img](/images/data-operate/gcs_export.png)
+
+## 3. 导入数据到 Doris
+
+导入使用 S3 Load 进行导入，**S3 Load 是一种异步的数据导入方式，执行后 Doris 会主动从数据源拉取数据**，数据源支持兼容 S3 
协议的对象存储，包括 ([AWS 
S3](./amazon-s3.md)，[GCS](./google-cloud-storage.md)，[AZURE](./azure-storage.md)等)。
+
+该方式适用于数据量大、需要后台异步处理的场景。对于需要同步处理的数据导入，可以参考 [TVF 
导入](./amazon-s3.md#load-with-tvf)。
+
+*注意：对于含有复杂类型（Struct/Array/Map）的 Parquet/ORC 格式文件导入，目前必须使用 TVF 导入*
+
+1. **导入单个文件的数据**
+
+  ```sql
+  LOAD LABEL sales_data_2025_04_08
+  (
+      DATA INFILE("s3://mybucket/export/sales_data/000000000000.parquet")
+      INTO TABLE sales_data
+      FORMAT AS "parquet"
+      (order_id, order_date, customer_name, amount, country)
+  )
+  WITH S3
+  (
+      "provider" = "GCP",
+      "s3.endpoint" = "storage.asia-southeast1.rep.googleapis.com",  
+      "s3.region" = "asia-southeast1",
+      "s3.access_key" = "<ak>",
+      "s3.secret_key" = "<sk>"
+  );
+  ```
+
+2. **通过 Show Load 查看任务运行情况**
+
+  由于 S3Load 导入是异步提交的，所以需要通过 show load 可以查看指定 label 的导入情况：
+
+  ```yaml
+  mysql> show load where label = "label_sales_data_2025_04_08"\G
+  *************************** 1. row ***************************
+          JobId: 17956078
+          Label: label_sales_data_2025_04_08
+          State: FINISHED
+        Progress: 100.00% (1/1)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):3600; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: NULL
+      CreateTime: 2025-04-10 17:50:53
+    EtlStartTime: 2025-04-10 17:50:54
+  EtlFinishTime: 2025-04-10 17:50:54
+  LoadStartTime: 2025-04-10 17:50:54
+  LoadFinishTime: 2025-04-10 17:50:54
+            URL: NULL
+      JobDetails: {"Unfinished 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+  TransactionId: 766228
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  1 row in set (0.00 sec)
+  ```
+
+3. **处理导入过程中的错误**
+
+  当有多个导入任务时，可以通过以下语句，查询数据导入失败的日期和原因。
+
+  ```yaml
+  mysql> show load where state='CANCELLED' and label like "label_test%"\G
+  *************************** 1. row ***************************
+          JobId: 18312384
+          Label: label_test123
+          State: CANCELLED
+        Progress: 100.00% (3/3)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):14400; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to 
cancel
+      CreateTime: 2025-04-15 17:32:59
+    EtlStartTime: 2025-04-15 17:33:02
+  EtlFinishTime: 2025-04-15 17:33:02
+  LoadStartTime: 2025-04-15 17:33:02
+  LoadFinishTime: 2025-04-15 17:33:02
+            URL: 
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+      JobDetails: {"Unfinished 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+  TransactionId: 769213
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  ```
+
+  如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED)，具体错误需要通过访问返回的 URL 
的链接进行查看，如下是数据超过了表中的 Schema 中 country 列的实际长度：
+
+  ```python
+  [root@VM-10-6-centos ~]$ curl 
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342";
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; . 
src line [];
+  ```
+
+  同时对于数据质量的错误，如果可以允许错误数据跳过的，可以通过在 S3 Load 任务中 Properties 
设置容错率，具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
+
+4. **导入多个文件的数据**
+
+  当需要迁移大数据量的存量数据时，建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区，数据量建议不超过 
100GB，以减轻系统压力并降低导入失败后的重试成本。
+
+  可参考脚本 
[s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh)，该脚本可以对对象存储上指定目录下的文件列表进行拆分，分批提交多个
 S3 Load 任务到 Doris 中，实现批量导入的效果。
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/bigquery.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/bigquery.md
new file mode 100644
index 00000000000..853521633cf
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/bigquery.md
@@ -0,0 +1,218 @@
+---
+{
+    "title": "BigQuery",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+在迁移 BigQuery 的过程中，通常需要借助对象存储作为中间媒介。核心流程如下：首先通过 BigQuery 的 
[Export](https://cloud.google.com/bigquery/docs/exporting-data) 语句将数据导出到 
GCS（Google Cloud Storage）；再利用 Doris 的 S3 Load 功能从对象存储中读取数据并导入到 Doris 中，具体可参考 
[S3 导入](./amazon-s3.md)。
+
+## 注意事项
+
+1. 在迁移之前，需要根据 BigQuery 的表结构选择 Doris 
的[数据模型](../../../table-design/data-model/overview.md)，以及[分区](../../../table-design/data-partitioning/dynamic-partitioning.md)和[分桶](../../../table-design/data-partitioning/data-bucketing.md)的策略，更多创建表策略可参考[导入最佳实践](../load-best-practices.md)。
+2. BigQuery 导出 JSON 类型时，不支持 Parquet 格式导出，可使用 JSON 格式导出。
+3. BigQuery 导出 Time 类型时，需要 Cast String 类型导出。
+
+## 数据类型映射
+
+| BigQuery           | Doris          | 备注                 |
+| ------------------ | -------------- | -------------------- |
+| Array              | Array          |                      |
+| BOOLEAN            | BOOLEAN        |                      |
+| DATE               | DATE           |                      |
+| DATETIME/TIMESTAMP | DATETIME       |                      |
+| JSON               | JSON           |                      |
+| INT64              | BIGINT         |                      |
+| NUMERIC            | DECIMAL        |                      |
+| FLOAT64            | DOUBLE         |                      |
+| STRING             | VARCHAR/STRING | VARCHAR 长度最大 65535 |
+| STRUCT             | STRUCT         |                      |
+| TIME               | STRING         |                      |
+| OTHER              | UNSUPPORTED    |                      |
+
+## 1. 创建表
+
+在迁移 BigQuery 表到 Doris 中的时候，需要先创建 Doris 表。
+
+假设我们在 BigQuery 中已存在如下表和数据
+
+```SQL
+CREATE OR REPLACE TABLE test.sales_data (
+    order_id      INT64,
+    customer_name STRING,
+    order_date    DATE,
+    amount        NUMERIC(10,2),
+    country       STRING
+)
+PARTITION BY  order_date
+
+
+INSERT INTO test.sales_data (order_id, customer_name, order_date, amount, 
country) VALUES
+(1, 'Alice', '2025-04-08', 99.99, 'USA'),
+(2, 'Bob', '2025-04-08', 149.50, 'Canada'),
+(3, 'Charlie', '2025-04-09', 75.00, 'UK'),
+(4, 'Diana', '2025-04-10', 200.00, 'Australia');
+```
+
+根据这个表结构，可以创建 Doris 主键分区表，分区字段和 Bigquery 的分区字段一致，同时按天分区
+
+```sql
+CREATE TABLE `sales_data` (
+  order_id      INT,
+  order_date    DATE NOT NULL,
+  customer_name VARCHAR(128),
+  amount        DECIMAL(10,2),
+  country       VARCHAR(48)
+) ENGINE=OLAP
+UNIQUE KEY(`order_id`,`order_date`)
+PARTITION BY RANGE(`order_date`) ()
+DISTRIBUTED BY HASH(`order_id`) BUCKETS 16
+PROPERTIES (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "DAY",
+ "dynamic_partition.start" = "-10",
+ "dynamic_partition.end" = "10",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "16",
+ "replication_num" = "1"
+);
+```
+
+## 2. 导出 BigQuery 数据
+
+1. **通过 Export 方式导出到 GCS Parquet 格式的文件**
+  ```sql
+    EXPORT DATA
+      OPTIONS (
+        uri = 'gs://mybucket/export/sales_data/*.parquet',
+        format = 'PARQUET')
+    AS (
+      SELECT *
+      FROM test.sales_data 
+    );
+  ```
+2. **查看 GCS 上的导出文件**
+  以上命令会将 sales_data 的数据导出到 GCS 
上，并且每个分区会产生一个或多个文件，文件名递增，具体可参考[exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files)，如下
+  ![img](/images/data-operate/gcs_export.png)
+
+## 3. 导入数据到 Doris
+
+导入使用 S3 Load 进行导入，**S3 Load 是一种异步的数据导入方式，执行后 Doris 会主动从数据源拉取数据**，数据源支持兼容 S3 
协议的对象存储，包括 ([AWS 
S3](./amazon-s3.md)，[GCS](./google-cloud-storage.md)，[AZURE](./azure-storage.md)等)。
+
+该方式适用于数据量大、需要后台异步处理的场景。对于需要同步处理的数据导入，可以参考 [TVF 
导入](./amazon-s3.md#load-with-tvf)。
+
+*注意：对于含有复杂类型（Struct/Array/Map）的 Parquet/ORC 格式文件导入，目前必须使用 TVF 导入*
+
+1. **导入单个文件的数据**
+
+  ```sql
+  LOAD LABEL sales_data_2025_04_08
+  (
+      DATA INFILE("s3://mybucket/export/sales_data/000000000000.parquet")
+      INTO TABLE sales_data
+      FORMAT AS "parquet"
+      (order_id, order_date, customer_name, amount, country)
+  )
+  WITH S3
+  (
+      "provider" = "GCP",
+      "s3.endpoint" = "storage.asia-southeast1.rep.googleapis.com",  
+      "s3.region" = "asia-southeast1",
+      "s3.access_key" = "<ak>",
+      "s3.secret_key" = "<sk>"
+  );
+  ```
+
+2. **通过 Show Load 查看任务运行情况**
+
+  由于 S3Load 导入是异步提交的，所以需要通过 show load 可以查看指定 label 的导入情况：
+
+  ```yaml
+  mysql> show load where label = "label_sales_data_2025_04_08"\G
+  *************************** 1. row ***************************
+          JobId: 17956078
+          Label: label_sales_data_2025_04_08
+          State: FINISHED
+        Progress: 100.00% (1/1)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):3600; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: NULL
+      CreateTime: 2025-04-10 17:50:53
+    EtlStartTime: 2025-04-10 17:50:54
+  EtlFinishTime: 2025-04-10 17:50:54
+  LoadStartTime: 2025-04-10 17:50:54
+  LoadFinishTime: 2025-04-10 17:50:54
+            URL: NULL
+      JobDetails: {"Unfinished 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+  TransactionId: 766228
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  1 row in set (0.00 sec)
+  ```
+
+3. **处理导入过程中的错误**
+
+  当有多个导入任务时，可以通过以下语句，查询数据导入失败的日期和原因。
+
+  ```yaml
+  mysql> show load where state='CANCELLED' and label like "label_test%"\G
+  *************************** 1. row ***************************
+          JobId: 18312384
+          Label: label_test123
+          State: CANCELLED
+        Progress: 100.00% (3/3)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):14400; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to 
cancel
+      CreateTime: 2025-04-15 17:32:59
+    EtlStartTime: 2025-04-15 17:33:02
+  EtlFinishTime: 2025-04-15 17:33:02
+  LoadStartTime: 2025-04-15 17:33:02
+  LoadFinishTime: 2025-04-15 17:33:02
+            URL: 
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+      JobDetails: {"Unfinished 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+  TransactionId: 769213
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  ```
+
+  如上面的例子是**数据质量错误**(ETL_QUALITY_UNSATISFIED)，具体错误需要通过访问返回的 URL 
的链接进行查看，如下是数据超过了表中的 Schema 中 country 列的实际长度：
+
+  ```python
+  [root@VM-10-6-centos ~]$ curl 
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342";
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; . 
src line [];
+  ```
+
+  同时对于数据质量的错误，如果可以允许错误数据跳过的，可以通过在 S3 Load 任务中 Properties 
设置容错率，具体可参考[导入配置参数](../../import/import-way/broker-load-manual.md#related-configurations)。
+
+4. **导入多个文件的数据**
+
+  当需要迁移大数据量的存量数据时，建议使用分批导入的策略。每批数据对应 Doris 的一个分区或少量几个分区，数据量建议不超过 
100GB，以减轻系统压力并降低导入失败后的重试成本。
+
+  可参考脚本 
[s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh)，该脚本可以对对象存储上指定目录下的文件列表进行拆分，分批提交多个
 S3 Load 任务到 Doris 中，实现批量导入的效果。
\ No newline at end of file
diff --git a/sidebars.json b/sidebars.json
index 827b1e838e8..97ff0f06744 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -158,6 +158,7 @@
                                 "data-operate/import/data-source/minio",
                                 
"data-operate/import/data-source/s3-compatible",
                                 "data-operate/import/data-source/snowflake",
+                                "data-operate/import/data-source/bigquery",
                                 
"data-operate/import/data-source/migrate-data-from-other-olap",
                                 
"data-operate/import/data-source/migrate-data-from-other-oltp"
                             ]
diff --git a/static/images/data-operate/gcs_export.png 
b/static/images/data-operate/gcs_export.png
new file mode 100644
index 00000000000..56cb84fe06a
Binary files /dev/null and b/static/images/data-operate/gcs_export.png differ
diff --git 
a/versioned_docs/version-2.1/data-operate/import/data-source/bigquery.md 
b/versioned_docs/version-2.1/data-operate/import/data-source/bigquery.md
new file mode 100644
index 00000000000..417535e3902
--- /dev/null
+++ b/versioned_docs/version-2.1/data-operate/import/data-source/bigquery.md
@@ -0,0 +1,220 @@
+---
+{
+    "title": "BigQuery",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+During the process of migrating BigQuery, it is usually necessary to use 
object storage as an intermediate medium. The core process is as follows: 
First, use BigQuery's 
[Export](https://cloud.google.com/bigquery/docs/exporting-data) statement to 
export data to GCS (Google Cloud Storage); then, use Doris's S3 Load function 
to read data from the object storage and load it into Doris. For details, 
please refer to [S3 Load](./amazon-s3.md).
+
+
+## Considerations
+
+1. Before the migration, it is necessary to select Doris' [Data 
Model](../../../table-design/data-model/overview.md), as well as the strategies 
for 
[Partitioning](../../../table-design/data-partitioning/dynamic-partitioning.md) 
and [Bucketing](../../../table-design/data-partitioning/data-bucketing.md) 
according to the table structure of BigQuery. For more table creation 
strategies, please refer to [Load Best Practices](../load-best-practices.md).
+2. When BigQuery exports data in JSON type, it does not support exporting in 
Parquet format. You can export it in JSON format instead.
+3. When BigQuery exports data of the Time type, it is necessary to export it 
after casting it to the String type.
+
+## Data type mapping
+
+| BigQuery           | Doris          | Comment                 |
+| ------------------ | -------------- | -------------------- |
+| Array              | Array          |                      |
+| BOOLEAN            | BOOLEAN        |                      |
+| DATE               | DATE           |                      |
+| DATETIME/TIMESTAMP | DATETIME       |                      |
+| JSON               | JSON           |                      |
+| INT64              | BIGINT         |                      |
+| NUMERIC            | DECIMAL        |                      |
+| FLOAT64            | DOUBLE         |                      |
+| STRING             | VARCHAR/STRING | VARCHAR maximum length is 65535 |
+| STRUCT             | STRUCT         |                      |
+| TIME               | STRING         |                      |
+| OTHER              | UNSUPPORTED    |                      |
+
+## 1. Create Table
+
+When migrating a BigQuery table to Doris, it is necessary to create a Doris 
table first.
+
+Suppose we already have the following table and data in BigQuery.
+
+```SQL
+CREATE OR REPLACE TABLE test.sales_data (
+    order_id      INT64,
+    customer_name STRING,
+    order_date    DATE,
+    amount        NUMERIC(10,2),
+    country       STRING
+)
+PARTITION BY  order_date
+
+
+INSERT INTO test.sales_data (order_id, customer_name, order_date, amount, 
country) VALUES
+(1, 'Alice', '2025-04-08', 99.99, 'USA'),
+(2, 'Bob', '2025-04-08', 149.50, 'Canada'),
+(3, 'Charlie', '2025-04-09', 75.00, 'UK'),
+(4, 'Diana', '2025-04-10', 200.00, 'Australia');
+```
+
+According to this table structure, a Doris primary key partitioned table can 
be created. The partition field should be the same as that in BigQuery, and the 
table should be partitioned on a daily basis.
+
+```sql
+CREATE TABLE `sales_data` (
+  order_id      INT,
+  order_date    DATE NOT NULL,
+  customer_name VARCHAR(128),
+  amount        DECIMAL(10,2),
+  country       VARCHAR(48)
+) ENGINE=OLAP
+UNIQUE KEY(`order_id`,`order_date`)
+PARTITION BY RANGE(`order_date`) ()
+DISTRIBUTED BY HASH(`order_id`) BUCKETS 16
+PROPERTIES (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "DAY",
+ "dynamic_partition.start" = "-10",
+ "dynamic_partition.end" = "10",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "16",
+ "replication_num" = "1"
+);
+```
+
+## 2. Export BigQuery Data
+
+1. **Export to GCS Parquet format file through Export method**
+  ```sql
+    EXPORT DATA
+      OPTIONS (
+        uri = 'gs://mybucket/export/sales_data/*.parquet',
+        format = 'PARQUET')
+    AS (
+      SELECT *
+      FROM test.sales_data 
+    );
+  ```
+2. **View the exported files on GCS**
+  The above command will export the data of sales_data to GCS, and each 
partition will generate one or more files with increasing file names. For 
details, please refer to 
[exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files),
 as follows:
+  ![img](/images/data-operate/gcs_export.png)
+
+
+## 3. Load Data to Doris
+
+S3 Load is an asynchronous data load method. After execution, Doris actively 
pulls data from the data source. The data source supports object storage 
compatible with the S3 protocol, including ([AWS 
S3](./amazon-s3.md)，[GCS](./google-cloud-storage.md)，[AZURE](./azure-storage.md)，etc)
+
+This method is suitable for scenarios involving large volumes of data that 
require asynchronous processing in the background. For data imports that need 
to be handled synchronously, refer to  [TVF Load](./amazon-s3.md#load-with-tvf)。
+
+*Note: For **Parquet/ORC format files that contain complex types 
(Struct/Array/Map)**, TVF Load must be used.*
+
+1. **Loading data from a single file**
+
+  ```sql
+  LOAD LABEL sales_data_2025_04_08
+  (
+      DATA INFILE("s3://mybucket/export/sales_data/000000000000.parquet")
+      INTO TABLE sales_data
+      FORMAT AS "parquet"
+      (order_id, order_date, customer_name, amount, country)
+  )
+  WITH S3
+  (
+      "provider" = "GCP",
+      "s3.endpoint" = "storage.asia-southeast1.rep.googleapis.com",  
+      "s3.region" = "asia-southeast1",
+      "s3.access_key" = "<ak>",
+      "s3.secret_key" = "<sk>"
+  );
+  ```
+
+2. **Check Load Status via SHOW LOAD**
+
+  Since S3 Load import is submitted asynchronously, you can check the status 
of a specific label using SHOW LOAD:
+
+  ```yaml
+  mysql> show load where label = "label_sales_data_2025_04_08"\G
+  *************************** 1. row ***************************
+          JobId: 17956078
+          Label: label_sales_data_2025_04_08
+          State: FINISHED
+        Progress: 100.00% (1/1)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):3600; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: NULL
+      CreateTime: 2025-04-10 17:50:53
+    EtlStartTime: 2025-04-10 17:50:54
+  EtlFinishTime: 2025-04-10 17:50:54
+  LoadStartTime: 2025-04-10 17:50:54
+  LoadFinishTime: 2025-04-10 17:50:54
+            URL: NULL
+      JobDetails: {"Unfinished 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+  TransactionId: 766228
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  1 row in set (0.00 sec)
+  ```
+
+3. **Handle Load Errors**
+
+  When there are multiple load tasks, you can use the following statement to 
query the dates and reasons for data load failures.
+
+  ```yaml
+  mysql> show load where state='CANCELLED' and label like "label_test%"\G
+  *************************** 1. row ***************************
+          JobId: 18312384
+          Label: label_test123
+          State: CANCELLED
+        Progress: 100.00% (3/3)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):14400; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to 
cancel
+      CreateTime: 2025-04-15 17:32:59
+    EtlStartTime: 2025-04-15 17:33:02
+  EtlFinishTime: 2025-04-15 17:33:02
+  LoadStartTime: 2025-04-15 17:33:02
+  LoadFinishTime: 2025-04-15 17:33:02
+            URL: 
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+      JobDetails: {"Unfinished 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+  TransactionId: 769213
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  ```
+
+  As shown in the example above, the issue is a **data quality 
error**(ETL_QUALITY_UNSATISFIED). To view the detailed error, you need to visit 
the URL provided in the result. For example, the data exceeded the defined 
length of the country column in the table schema:
+
+  ```python
+  [root@VM-10-6-centos ~]$ curl 
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342";
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; . 
src line [];
+  ```
+
+  For data quality errors, if you want to allow skipping erroneous records, 
you can set a fault tolerance rate in the Properties section of the S3 Load 
task. For details, refer to [Load Configuration 
Parameters](../../import/import-way/broker-load-manual.md#related-configurations)。
+  
+1. **Loading data from multiple files**
+
+   When migrating a large volume of historical data, it is recommended to use 
a batch load strategy. Each batch corresponds to one or a few partitions in 
Doris. It is recommended to keep the data size under 100GB per batch to reduce 
system load and lower the cost of retries in case of load failures.
+
+   You can refer to the script 
[s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh),
 which can split the file list under the specified directory on the object 
storage and submit multiple S3 Load tasks to Doris in batches to achieve the 
effect of batch load.
diff --git 
a/versioned_docs/version-3.0/data-operate/import/data-source/bigquery.md 
b/versioned_docs/version-3.0/data-operate/import/data-source/bigquery.md
new file mode 100644
index 00000000000..417535e3902
--- /dev/null
+++ b/versioned_docs/version-3.0/data-operate/import/data-source/bigquery.md
@@ -0,0 +1,220 @@
+---
+{
+    "title": "BigQuery",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+During the process of migrating BigQuery, it is usually necessary to use 
object storage as an intermediate medium. The core process is as follows: 
First, use BigQuery's 
[Export](https://cloud.google.com/bigquery/docs/exporting-data) statement to 
export data to GCS (Google Cloud Storage); then, use Doris's S3 Load function 
to read data from the object storage and load it into Doris. For details, 
please refer to [S3 Load](./amazon-s3.md).
+
+
+## Considerations
+
+1. Before the migration, it is necessary to select Doris' [Data 
Model](../../../table-design/data-model/overview.md), as well as the strategies 
for 
[Partitioning](../../../table-design/data-partitioning/dynamic-partitioning.md) 
and [Bucketing](../../../table-design/data-partitioning/data-bucketing.md) 
according to the table structure of BigQuery. For more table creation 
strategies, please refer to [Load Best Practices](../load-best-practices.md).
+2. When BigQuery exports data in JSON type, it does not support exporting in 
Parquet format. You can export it in JSON format instead.
+3. When BigQuery exports data of the Time type, it is necessary to export it 
after casting it to the String type.
+
+## Data type mapping
+
+| BigQuery           | Doris          | Comment                 |
+| ------------------ | -------------- | -------------------- |
+| Array              | Array          |                      |
+| BOOLEAN            | BOOLEAN        |                      |
+| DATE               | DATE           |                      |
+| DATETIME/TIMESTAMP | DATETIME       |                      |
+| JSON               | JSON           |                      |
+| INT64              | BIGINT         |                      |
+| NUMERIC            | DECIMAL        |                      |
+| FLOAT64            | DOUBLE         |                      |
+| STRING             | VARCHAR/STRING | VARCHAR maximum length is 65535 |
+| STRUCT             | STRUCT         |                      |
+| TIME               | STRING         |                      |
+| OTHER              | UNSUPPORTED    |                      |
+
+## 1. Create Table
+
+When migrating a BigQuery table to Doris, it is necessary to create a Doris 
table first.
+
+Suppose we already have the following table and data in BigQuery.
+
+```SQL
+CREATE OR REPLACE TABLE test.sales_data (
+    order_id      INT64,
+    customer_name STRING,
+    order_date    DATE,
+    amount        NUMERIC(10,2),
+    country       STRING
+)
+PARTITION BY  order_date
+
+
+INSERT INTO test.sales_data (order_id, customer_name, order_date, amount, 
country) VALUES
+(1, 'Alice', '2025-04-08', 99.99, 'USA'),
+(2, 'Bob', '2025-04-08', 149.50, 'Canada'),
+(3, 'Charlie', '2025-04-09', 75.00, 'UK'),
+(4, 'Diana', '2025-04-10', 200.00, 'Australia');
+```
+
+According to this table structure, a Doris primary key partitioned table can 
be created. The partition field should be the same as that in BigQuery, and the 
table should be partitioned on a daily basis.
+
+```sql
+CREATE TABLE `sales_data` (
+  order_id      INT,
+  order_date    DATE NOT NULL,
+  customer_name VARCHAR(128),
+  amount        DECIMAL(10,2),
+  country       VARCHAR(48)
+) ENGINE=OLAP
+UNIQUE KEY(`order_id`,`order_date`)
+PARTITION BY RANGE(`order_date`) ()
+DISTRIBUTED BY HASH(`order_id`) BUCKETS 16
+PROPERTIES (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "DAY",
+ "dynamic_partition.start" = "-10",
+ "dynamic_partition.end" = "10",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "16",
+ "replication_num" = "1"
+);
+```
+
+## 2. Export BigQuery Data
+
+1. **Export to GCS Parquet format file through Export method**
+  ```sql
+    EXPORT DATA
+      OPTIONS (
+        uri = 'gs://mybucket/export/sales_data/*.parquet',
+        format = 'PARQUET')
+    AS (
+      SELECT *
+      FROM test.sales_data 
+    );
+  ```
+2. **View the exported files on GCS**
+  The above command will export the data of sales_data to GCS, and each 
partition will generate one or more files with increasing file names. For 
details, please refer to 
[exporting-data](https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files),
 as follows:
+  ![img](/images/data-operate/gcs_export.png)
+
+
+## 3. Load Data to Doris
+
+S3 Load is an asynchronous data load method. After execution, Doris actively 
pulls data from the data source. The data source supports object storage 
compatible with the S3 protocol, including ([AWS 
S3](./amazon-s3.md)，[GCS](./google-cloud-storage.md)，[AZURE](./azure-storage.md)，etc)
+
+This method is suitable for scenarios involving large volumes of data that 
require asynchronous processing in the background. For data imports that need 
to be handled synchronously, refer to  [TVF Load](./amazon-s3.md#load-with-tvf)。
+
+*Note: For **Parquet/ORC format files that contain complex types 
(Struct/Array/Map)**, TVF Load must be used.*
+
+1. **Loading data from a single file**
+
+  ```sql
+  LOAD LABEL sales_data_2025_04_08
+  (
+      DATA INFILE("s3://mybucket/export/sales_data/000000000000.parquet")
+      INTO TABLE sales_data
+      FORMAT AS "parquet"
+      (order_id, order_date, customer_name, amount, country)
+  )
+  WITH S3
+  (
+      "provider" = "GCP",
+      "s3.endpoint" = "storage.asia-southeast1.rep.googleapis.com",  
+      "s3.region" = "asia-southeast1",
+      "s3.access_key" = "<ak>",
+      "s3.secret_key" = "<sk>"
+  );
+  ```
+
+2. **Check Load Status via SHOW LOAD**
+
+  Since S3 Load import is submitted asynchronously, you can check the status 
of a specific label using SHOW LOAD:
+
+  ```yaml
+  mysql> show load where label = "label_sales_data_2025_04_08"\G
+  *************************** 1. row ***************************
+          JobId: 17956078
+          Label: label_sales_data_2025_04_08
+          State: FINISHED
+        Progress: 100.00% (1/1)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=2
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):3600; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: NULL
+      CreateTime: 2025-04-10 17:50:53
+    EtlStartTime: 2025-04-10 17:50:54
+  EtlFinishTime: 2025-04-10 17:50:54
+  LoadStartTime: 2025-04-10 17:50:54
+  LoadFinishTime: 2025-04-10 17:50:54
+            URL: NULL
+      JobDetails: {"Unfinished 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[]},"ScannedRows":2,"TaskNumber":1,"LoadBytes":91,"All
 
backends":{"5eec1be8612d4872-91040ff1e7208a4f":[10022]},"FileNumber":1,"FileSize":1620}
+  TransactionId: 766228
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  1 row in set (0.00 sec)
+  ```
+
+3. **Handle Load Errors**
+
+  When there are multiple load tasks, you can use the following statement to 
query the dates and reasons for data load failures.
+
+  ```yaml
+  mysql> show load where state='CANCELLED' and label like "label_test%"\G
+  *************************** 1. row ***************************
+          JobId: 18312384
+          Label: label_test123
+          State: CANCELLED
+        Progress: 100.00% (3/3)
+            Type: BROKER
+        EtlInfo: unselected.rows=0; dpp.abnorm.ALL=4; dpp.norm.ALL=0
+        TaskInfo: cluster:storage.asia-southeast1.rep.googleapis.com; 
timeout(s):14400; max_filter_ratio:0.0; priority:NORMAL
+        ErrorMsg: type:ETL_QUALITY_UNSATISFIED; msg:quality not good enough to 
cancel
+      CreateTime: 2025-04-15 17:32:59
+    EtlStartTime: 2025-04-15 17:33:02
+  EtlFinishTime: 2025-04-15 17:33:02
+  LoadStartTime: 2025-04-15 17:33:02
+  LoadFinishTime: 2025-04-15 17:33:02
+            URL: 
http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342
+      JobDetails: {"Unfinished 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[]},"ScannedRows":4,"TaskNumber":1,"LoadBytes":188,"All
 
backends":{"7602ccd7c3a4854-95307efca7bfe341":[10022]},"FileNumber":3,"FileSize":4839}
+  TransactionId: 769213
+    ErrorTablets: {}
+            User: root
+        Comment: 
+  ```
+
+  As shown in the example above, the issue is a **data quality 
error**(ETL_QUALITY_UNSATISFIED). To view the detailed error, you need to visit 
the URL provided in the result. For example, the data exceeded the defined 
length of the country column in the table schema:
+
+  ```python
+  [root@VM-10-6-centos ~]$ curl 
"http://10.16.10.6:28747/api/_load_error_log?file=__shard_2/error_log_insert_stmt_7602ccd7c3a4854-95307efca7bfe342_7602ccd7c3a4854_95307efca7bfe342";
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [USA] schema length: 1; actual length: 3; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Canada] schema length: 1; actual length: 6; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [UK] schema length: 1; actual length: 2; . src 
line []; 
+  Reason: column_name[country], the length of input is too long than schema. 
first 32 bytes of input str: [Australia] schema length: 1; actual length: 9; . 
src line [];
+  ```
+
+  For data quality errors, if you want to allow skipping erroneous records, 
you can set a fault tolerance rate in the Properties section of the S3 Load 
task. For details, refer to [Load Configuration 
Parameters](../../import/import-way/broker-load-manual.md#related-configurations)。
+  
+1. **Loading data from multiple files**
+
+   When migrating a large volume of historical data, it is recommended to use 
a batch load strategy. Each batch corresponds to one or a few partitions in 
Doris. It is recommended to keep the data size under 100GB per batch to reduce 
system load and lower the cost of retries in case of load failures.
+
+   You can refer to the script 
[s3_load_file_demo.sh](https://github.com/apache/doris/blob/master/samples/load/shell/s3_load_file_demo.sh),
 which can split the file list under the specified directory on the object 
storage and submit multiple S3 Load tasks to Doris in batches to achieve the 
effect of batch load.
diff --git a/versioned_sidebars/version-2.1-sidebars.json 
b/versioned_sidebars/version-2.1-sidebars.json
index 6069ea8f7f1..c8453959e32 100644
--- a/versioned_sidebars/version-2.1-sidebars.json
+++ b/versioned_sidebars/version-2.1-sidebars.json
@@ -134,6 +134,7 @@
                                 "data-operate/import/data-source/minio",
                                 
"data-operate/import/data-source/s3-compatible",
                                 "data-operate/import/data-source/snowflake",
+                                "data-operate/import/data-source/bigquery",
                                 
"data-operate/import/data-source/migrate-data-from-other-olap",
                                 
"data-operate/import/data-source/migrate-data-from-other-oltp"
                             ]
diff --git a/versioned_sidebars/version-3.0-sidebars.json 
b/versioned_sidebars/version-3.0-sidebars.json
index a17d533c196..2fd6f21b618 100644
--- a/versioned_sidebars/version-3.0-sidebars.json
+++ b/versioned_sidebars/version-3.0-sidebars.json
@@ -158,6 +158,7 @@
                                 "data-operate/import/data-source/minio",
                                 
"data-operate/import/data-source/s3-compatible",
                                 "data-operate/import/data-source/snowflake",
+                                "data-operate/import/data-source/bigquery",
                                 
"data-operate/import/data-source/migrate-data-from-other-olap",
                                 
"data-operate/import/data-source/migrate-data-from-other-oltp"
                             ]


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris-website) branch master updated: [doc](load) add bigquery datasource doc (#2318)

Reply via email to