This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 4c48873fe03 [opt] opt http tvf doc (#3431)
4c48873fe03 is described below
commit 4c48873fe0395214efec09d350565cbe22b0cba8
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Thu Mar 5 09:46:06 2026 +0800
[opt] opt http tvf doc (#3431)
## Versions
- [x] dev
- [x] 4.x
- [ ] 3.x
- [ ] 2.1
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [x] Checked by AI
- [ ] Test Cases Built
---
.../sql-functions/table-valued-functions/http.md | 105 +++++++++++++--------
.../sql-functions/table-valued-functions/http.md | 105 +++++++++++++--------
.../sql-functions/table-valued-functions/http.md | 105 +++++++++++++--------
.../sql-functions/table-valued-functions/http.md | 105 +++++++++++++--------
4 files changed, 260 insertions(+), 160 deletions(-)
diff --git a/docs/sql-manual/sql-functions/table-valued-functions/http.md
b/docs/sql-manual/sql-functions/table-valued-functions/http.md
index bcabcc28a88..1523f76bd1a 100644
--- a/docs/sql-manual/sql-functions/table-valued-functions/http.md
+++ b/docs/sql-manual/sql-functions/table-valued-functions/http.md
@@ -2,14 +2,20 @@
{
"title": "HTTP",
"language": "en",
- "description": "HTTP table-valued-function (tvf) allows users to read and
access file content on HTTP paths as if accessing relational table format data."
+ "description": "Apache Doris HTTP table-valued function (TVF) enables
direct SQL queries on any HTTP/HTTPS endpoint data, including REST API
responses, remote data files, and Hugging Face datasets. Supports JSON, CSV,
Parquet, ORC format parsing."
}
---
-HTTP table-valued-function (tvf) allows users to read and access file content
on HTTP paths as if accessing relational table format data. Currently supports
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file formats.
+HTTP table-valued-function (TVF) allows users to read data returned from any
HTTP endpoint as if accessing relational table format data. As long as the
returned data is in a supported format, it can be queried and analyzed directly
via SQL. Currently supports
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` data formats.
+
+Typical use cases include:
+
+- Querying data files hosted on HTTP/HTTPS (e.g., GitHub, S3, etc.).
+- Directly querying HTTP API endpoints that return JSON-formatted data.
+- Accessing datasets hosted on Hugging Face.
:::note
-Supported since 4.0.2
+Supported since version 4.0.2.
:::
## Syntax
@@ -26,52 +32,71 @@ HTTP(
| Parameter | Description |
|-------------------|------------------------------|
-| uri | HTTP address for access. Supports `http`, `https` and
`hf` protocols.|
-| format | File format, supports
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` |
+| uri | The HTTP address to access. Supports `http`, `https`,
and `hf` protocols. Can be a URL to a data file or an API endpoint that returns
data. |
+| format | Data format, i.e., how the content returned by the HTTP
endpoint is parsed. Supports
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc`. |
-About `hf://`(Hugging Face), please see [Analyzing Hugging Face
Data](../../../lakehouse/huggingface.md).
+For `hf://` (Hugging Face), please refer to [Analyzing Hugging Face
Data](../../../lakehouse/huggingface.md).
### Optional Parameters
-| Parameter | Description | Notes |
+| Parameter | Description | Notes |
|-------|-----------|------------------------|
-| `http.header.xxx` | Used to specify arbitrary HTTP Headers, which will be
directly passed to the HTTP Client. For example `"http.header.Authorization" =
"Bearer hf_MWYzOJJoZEymb..."`, the final Header will be `Authorization: Bearer
hf_MWYzOJJoZEymb...` |
-| `http.enable.range.request` | Whether to use range request to access HTTP
service. Default is `true`.|
-| `http.max.request.size.bytes` | Maximum access size limit when using
non-range request mode. Default is 100MB |
+| `http.header.xxx` | Used to specify arbitrary HTTP Headers, which are
passed directly to the HTTP Client. | e.g., `"http.header.Authorization" =
"Bearer hf_MWYzOJJoZEymb..."`, the resulting Header will be `Authorization:
Bearer hf_MWYzOJJoZEymb...` |
+| `http.enable.range.request` | Whether to use range requests to access the
HTTP service. Default is `true`. | |
+| `http.max.request.size.bytes` | Maximum access size limit when using
non-range request mode. Default is 100 MB. | |
-When `http.enable.range.request` is `true`, the system will first try to
access the HTTP service using range request. If the HTTP service does not
support range request, it will automatically fall back to non-range request
mode. And the maximum access data size is limited by
`http.max.request.size.bytes`.
+When `http.enable.range.request` is `true`, the system will first attempt to
access the HTTP service using range requests. If the HTTP service does not
support range requests, it will automatically fall back to non-range request
mode. The maximum data access size is limited by `http.max.request.size.bytes`.
## Examples
+### Reading Data Files over HTTP
+
- Read CSV data from GitHub
- ```sql
- SELECT COUNT(*) FROM
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
- "format" = "csv",
- "column_separator" = ","
- );
- ```
-
-- Access Parquet data from GitHub
-
- ```sql
- SELECT arr_map, id FROM
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
- "format" = "parquet"
- );
- ```
-
-- Access JSON data from GitHub and use with `desc function`
-
- ```sql
- DESC FUNCTION
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
- "format" = "json",
- "strip_outer_array" = "true"
- );
- ```
+ ```sql
+ SELECT COUNT(*) FROM
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
+ "format" = "csv",
+ "column_separator" = ","
+ );
+ ```
+
+- Read Parquet data from GitHub
+
+ ```sql
+ SELECT arr_map, id FROM
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
+ "format" = "parquet"
+ );
+ ```
+
+- Read JSON data from GitHub and use `DESC FUNCTION` to view the schema
+
+ ```sql
+ DESC FUNCTION
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
+ "format" = "json",
+ "strip_outer_array" = "true"
+ );
+ ```
+### Querying HTTP API Endpoints
+
+You can use the HTTP table function to directly query HTTP API endpoints that
return JSON-formatted data. For example, querying a REST API that returns JSON
data:
+
+```sql
+SELECT * FROM
+HTTP(
+ "uri" = "https://api.example.com/v1/data",
+ "format" = "json",
+ "http.header.Authorization" = "Bearer your_token",
+ "strip_outer_array" = "true"
+);
+```
+
+:::tip
+For HTTP API endpoints that do not support Range Requests, the system will
automatically fall back to non-Range Request mode. You can manually disable
Range Requests via the `http.enable.range.request` parameter.
+:::
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/http.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/http.md
index 7b38ccbde8b..4041dc37d6c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/http.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/table-valued-functions/http.md
@@ -2,11 +2,17 @@
{
"title": "HTTP",
"language": "zh-CN",
- "description": "HTTP
表函数(table-valued-function,tvf),可以让用户像访问关系表格式数据一样,读取并访问 HTTP 路径上的文件内容。目前支持
csv/csvwithnames/csvwithnamesandtypes/json/parquet/orc 文件格式。"
+ "description": "Apache Doris HTTP 表函数(TVF)支持通过 SQL 直接查询任意 HTTP/HTTPS
端点数据,包括 REST API 接口、远程数据文件及 Hugging Face 数据集。支持 JSON、CSV、Parquet、ORC 等格式解析。"
}
---
-HTTP 表函数(table-valued-function,tvf),可以让用户像访问关系表格式数据一样,读取并访问 HTTP 路径上的文件内容。目前支持
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` 文件格式。
+HTTP 表函数(table-valued-function,tvf),可以让用户像访问关系表格式数据一样,读取任意 HTTP
端点返回的数据。只要返回的数据满足支持的格式,即可直接通过 SQL 进行查询和分析。目前支持
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` 数据格式。
+
+典型使用场景包括:
+
+- 查询 HTTP/HTTPS 上托管的数据文件(如 GitHub、S3 等)。
+- 直接查询返回 JSON 格式数据的 HTTP API 接口。
+- 访问 Hugging Face 上托管的数据集。
:::note
该函数自 4.0.2 版本支持。
@@ -26,52 +32,71 @@ HTTP(
| 参数 | 描述 |
|-------------------|----------------------------|
-| uri | 用于访问的 HTTP 地址。支持 `http`,`https` he `hf` 协议。|
-| format | 文件格式,支持
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` |
-
+| uri | 访问的 HTTP 地址。支持 `http`、`https` 和 `hf` 协议。可以是数据文件的
URL,也可以是返回数据的 API 端点。|
+| format | 数据格式,即 HTTP 端点返回内容的解析方式。支持
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc`。 |
-关于 `hf://`(Hugging Face), 请参阅 [Analyzing Hugging Face
Data](../../../lakehouse/huggingface.md)。
+关于 `hf://`(Hugging Face),请参阅 [Analyzing Hugging Face
Data](../../../lakehouse/huggingface.md)。
### 可选参数
| 参数 | 描述 | 备注 |
|-------|-----------|------------------------|
-| `http.header.xxx` | 用于指定任意的 HTTP Header,这些信息会直接透传给 HTTP Client。如
`"http.header.Authorization" = "Bearer hf_MWYzOJJoZEymb..."`,最终 Header 为
`Authorization: Bearer hf_MWYzOJJoZEymb...` |
-| `http.enable.range.request` | 是否使用 range request 访问 HTTP 服务。默认为 `true`。|
-| `http.max.request.size.bytes` | 当使用非 range request 方式访问时,最大访问大小限制。默认是 100MB |
+| `http.header.xxx` | 用于指定任意的 HTTP Header,这些信息会直接透传给 HTTP Client。 | 如
`"http.header.Authorization" = "Bearer hf_MWYzOJJoZEymb..."`,最终 Header 为
`Authorization: Bearer hf_MWYzOJJoZEymb...` |
+| `http.enable.range.request` | 是否使用 range request 访问 HTTP 服务。默认为 `true`。 | |
+| `http.max.request.size.bytes` | 当使用非 range request 方式访问时,最大访问大小限制。默认是 100
MB。 | |
当 `http.enable.range.request` 为 `true` 时,系统会优先尝试使用 range request 访问 HTTP 服务。如果
HTTP 服务不支持 range request,则会自动回退到非 range request 方式访问。并且最大访问数据量受到
`http.max.request.size.bytes` 限制。
## 示例
-- 读取 github 上的 csv 数据
-
- ```sql
- SELECT COUNT(*) FROM
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
- "format" = "csv",
- "column_separator" = ","
- );
- ```
-
-- 访问 github 上的 parquet 数据
-
- ```sql
- SELECT arr_map, id FROM
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
- "format" = "parquet"
- );
- ```
-
-- 访问 github 上的 json 数据,并配合 `desc function` 使用
-
- ```sql
- DESC FUNCTION
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
- "format" = "json",
- "strip_outer_array" = "true"
- );
- ```
+### 读取 HTTP 上的数据文件
+
+- 读取 GitHub 上的 CSV 数据
+
+ ```sql
+ SELECT COUNT(*) FROM
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
+ "format" = "csv",
+ "column_separator" = ","
+ );
+ ```
+
+- 读取 GitHub 上的 Parquet 数据
+
+ ```sql
+ SELECT arr_map, id FROM
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
+ "format" = "parquet"
+ );
+ ```
+
+- 读取 GitHub 上的 JSON 数据,并配合 `DESC FUNCTION` 查看 Schema
+
+ ```sql
+ DESC FUNCTION
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
+ "format" = "json",
+ "strip_outer_array" = "true"
+ );
+ ```
+
+### 查询 HTTP API 接口
+
+通过 HTTP 表函数,可以直接查询返回 JSON 格式数据的 HTTP API 接口。例如,查询一个返回 JSON 数据的 REST API:
+
+```sql
+SELECT * FROM
+HTTP(
+ "uri" = "https://api.example.com/v1/data",
+ "format" = "json",
+ "http.header.Authorization" = "Bearer your_token",
+ "strip_outer_array" = "true"
+);
+```
+
+:::tip
+对于不支持 Range Request 的 HTTP API 接口,系统会自动回退到非 Range Request 方式访问。可通过
`http.enable.range.request` 参数手动关闭 Range Request。
+:::
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/table-valued-functions/http.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/table-valued-functions/http.md
index 7b38ccbde8b..4041dc37d6c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/table-valued-functions/http.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/table-valued-functions/http.md
@@ -2,11 +2,17 @@
{
"title": "HTTP",
"language": "zh-CN",
- "description": "HTTP
表函数(table-valued-function,tvf),可以让用户像访问关系表格式数据一样,读取并访问 HTTP 路径上的文件内容。目前支持
csv/csvwithnames/csvwithnamesandtypes/json/parquet/orc 文件格式。"
+ "description": "Apache Doris HTTP 表函数(TVF)支持通过 SQL 直接查询任意 HTTP/HTTPS
端点数据,包括 REST API 接口、远程数据文件及 Hugging Face 数据集。支持 JSON、CSV、Parquet、ORC 等格式解析。"
}
---
-HTTP 表函数(table-valued-function,tvf),可以让用户像访问关系表格式数据一样,读取并访问 HTTP 路径上的文件内容。目前支持
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` 文件格式。
+HTTP 表函数(table-valued-function,tvf),可以让用户像访问关系表格式数据一样,读取任意 HTTP
端点返回的数据。只要返回的数据满足支持的格式,即可直接通过 SQL 进行查询和分析。目前支持
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` 数据格式。
+
+典型使用场景包括:
+
+- 查询 HTTP/HTTPS 上托管的数据文件(如 GitHub、S3 等)。
+- 直接查询返回 JSON 格式数据的 HTTP API 接口。
+- 访问 Hugging Face 上托管的数据集。
:::note
该函数自 4.0.2 版本支持。
@@ -26,52 +32,71 @@ HTTP(
| 参数 | 描述 |
|-------------------|----------------------------|
-| uri | 用于访问的 HTTP 地址。支持 `http`,`https` he `hf` 协议。|
-| format | 文件格式,支持
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` |
-
+| uri | 访问的 HTTP 地址。支持 `http`、`https` 和 `hf` 协议。可以是数据文件的
URL,也可以是返回数据的 API 端点。|
+| format | 数据格式,即 HTTP 端点返回内容的解析方式。支持
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc`。 |
-关于 `hf://`(Hugging Face), 请参阅 [Analyzing Hugging Face
Data](../../../lakehouse/huggingface.md)。
+关于 `hf://`(Hugging Face),请参阅 [Analyzing Hugging Face
Data](../../../lakehouse/huggingface.md)。
### 可选参数
| 参数 | 描述 | 备注 |
|-------|-----------|------------------------|
-| `http.header.xxx` | 用于指定任意的 HTTP Header,这些信息会直接透传给 HTTP Client。如
`"http.header.Authorization" = "Bearer hf_MWYzOJJoZEymb..."`,最终 Header 为
`Authorization: Bearer hf_MWYzOJJoZEymb...` |
-| `http.enable.range.request` | 是否使用 range request 访问 HTTP 服务。默认为 `true`。|
-| `http.max.request.size.bytes` | 当使用非 range request 方式访问时,最大访问大小限制。默认是 100MB |
+| `http.header.xxx` | 用于指定任意的 HTTP Header,这些信息会直接透传给 HTTP Client。 | 如
`"http.header.Authorization" = "Bearer hf_MWYzOJJoZEymb..."`,最终 Header 为
`Authorization: Bearer hf_MWYzOJJoZEymb...` |
+| `http.enable.range.request` | 是否使用 range request 访问 HTTP 服务。默认为 `true`。 | |
+| `http.max.request.size.bytes` | 当使用非 range request 方式访问时,最大访问大小限制。默认是 100
MB。 | |
当 `http.enable.range.request` 为 `true` 时,系统会优先尝试使用 range request 访问 HTTP 服务。如果
HTTP 服务不支持 range request,则会自动回退到非 range request 方式访问。并且最大访问数据量受到
`http.max.request.size.bytes` 限制。
## 示例
-- 读取 github 上的 csv 数据
-
- ```sql
- SELECT COUNT(*) FROM
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
- "format" = "csv",
- "column_separator" = ","
- );
- ```
-
-- 访问 github 上的 parquet 数据
-
- ```sql
- SELECT arr_map, id FROM
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
- "format" = "parquet"
- );
- ```
-
-- 访问 github 上的 json 数据,并配合 `desc function` 使用
-
- ```sql
- DESC FUNCTION
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
- "format" = "json",
- "strip_outer_array" = "true"
- );
- ```
+### 读取 HTTP 上的数据文件
+
+- 读取 GitHub 上的 CSV 数据
+
+ ```sql
+ SELECT COUNT(*) FROM
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
+ "format" = "csv",
+ "column_separator" = ","
+ );
+ ```
+
+- 读取 GitHub 上的 Parquet 数据
+
+ ```sql
+ SELECT arr_map, id FROM
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
+ "format" = "parquet"
+ );
+ ```
+
+- 读取 GitHub 上的 JSON 数据,并配合 `DESC FUNCTION` 查看 Schema
+
+ ```sql
+ DESC FUNCTION
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
+ "format" = "json",
+ "strip_outer_array" = "true"
+ );
+ ```
+
+### 查询 HTTP API 接口
+
+通过 HTTP 表函数,可以直接查询返回 JSON 格式数据的 HTTP API 接口。例如,查询一个返回 JSON 数据的 REST API:
+
+```sql
+SELECT * FROM
+HTTP(
+ "uri" = "https://api.example.com/v1/data",
+ "format" = "json",
+ "http.header.Authorization" = "Bearer your_token",
+ "strip_outer_array" = "true"
+);
+```
+
+:::tip
+对于不支持 Range Request 的 HTTP API 接口,系统会自动回退到非 Range Request 方式访问。可通过
`http.enable.range.request` 参数手动关闭 Range Request。
+:::
diff --git
a/versioned_docs/version-4.x/sql-manual/sql-functions/table-valued-functions/http.md
b/versioned_docs/version-4.x/sql-manual/sql-functions/table-valued-functions/http.md
index bcabcc28a88..1523f76bd1a 100644
---
a/versioned_docs/version-4.x/sql-manual/sql-functions/table-valued-functions/http.md
+++
b/versioned_docs/version-4.x/sql-manual/sql-functions/table-valued-functions/http.md
@@ -2,14 +2,20 @@
{
"title": "HTTP",
"language": "en",
- "description": "HTTP table-valued-function (tvf) allows users to read and
access file content on HTTP paths as if accessing relational table format data."
+ "description": "Apache Doris HTTP table-valued function (TVF) enables
direct SQL queries on any HTTP/HTTPS endpoint data, including REST API
responses, remote data files, and Hugging Face datasets. Supports JSON, CSV,
Parquet, ORC format parsing."
}
---
-HTTP table-valued-function (tvf) allows users to read and access file content
on HTTP paths as if accessing relational table format data. Currently supports
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file formats.
+HTTP table-valued-function (TVF) allows users to read data returned from any
HTTP endpoint as if accessing relational table format data. As long as the
returned data is in a supported format, it can be queried and analyzed directly
via SQL. Currently supports
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` data formats.
+
+Typical use cases include:
+
+- Querying data files hosted on HTTP/HTTPS (e.g., GitHub, S3, etc.).
+- Directly querying HTTP API endpoints that return JSON-formatted data.
+- Accessing datasets hosted on Hugging Face.
:::note
-Supported since 4.0.2
+Supported since version 4.0.2.
:::
## Syntax
@@ -26,52 +32,71 @@ HTTP(
| Parameter | Description |
|-------------------|------------------------------|
-| uri | HTTP address for access. Supports `http`, `https` and
`hf` protocols.|
-| format | File format, supports
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` |
+| uri | The HTTP address to access. Supports `http`, `https`,
and `hf` protocols. Can be a URL to a data file or an API endpoint that returns
data. |
+| format | Data format, i.e., how the content returned by the HTTP
endpoint is parsed. Supports
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc`. |
-About `hf://`(Hugging Face), please see [Analyzing Hugging Face
Data](../../../lakehouse/huggingface.md).
+For `hf://` (Hugging Face), please refer to [Analyzing Hugging Face
Data](../../../lakehouse/huggingface.md).
### Optional Parameters
-| Parameter | Description | Notes |
+| Parameter | Description | Notes |
|-------|-----------|------------------------|
-| `http.header.xxx` | Used to specify arbitrary HTTP Headers, which will be
directly passed to the HTTP Client. For example `"http.header.Authorization" =
"Bearer hf_MWYzOJJoZEymb..."`, the final Header will be `Authorization: Bearer
hf_MWYzOJJoZEymb...` |
-| `http.enable.range.request` | Whether to use range request to access HTTP
service. Default is `true`.|
-| `http.max.request.size.bytes` | Maximum access size limit when using
non-range request mode. Default is 100MB |
+| `http.header.xxx` | Used to specify arbitrary HTTP Headers, which are
passed directly to the HTTP Client. | e.g., `"http.header.Authorization" =
"Bearer hf_MWYzOJJoZEymb..."`, the resulting Header will be `Authorization:
Bearer hf_MWYzOJJoZEymb...` |
+| `http.enable.range.request` | Whether to use range requests to access the
HTTP service. Default is `true`. | |
+| `http.max.request.size.bytes` | Maximum access size limit when using
non-range request mode. Default is 100 MB. | |
-When `http.enable.range.request` is `true`, the system will first try to
access the HTTP service using range request. If the HTTP service does not
support range request, it will automatically fall back to non-range request
mode. And the maximum access data size is limited by
`http.max.request.size.bytes`.
+When `http.enable.range.request` is `true`, the system will first attempt to
access the HTTP service using range requests. If the HTTP service does not
support range requests, it will automatically fall back to non-range request
mode. The maximum data access size is limited by `http.max.request.size.bytes`.
## Examples
+### Reading Data Files over HTTP
+
- Read CSV data from GitHub
- ```sql
- SELECT COUNT(*) FROM
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
- "format" = "csv",
- "column_separator" = ","
- );
- ```
-
-- Access Parquet data from GitHub
-
- ```sql
- SELECT arr_map, id FROM
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
- "format" = "parquet"
- );
- ```
-
-- Access JSON data from GitHub and use with `desc function`
-
- ```sql
- DESC FUNCTION
- HTTP(
- "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
- "format" = "json",
- "strip_outer_array" = "true"
- );
- ```
+ ```sql
+ SELECT COUNT(*) FROM
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/http_stream/all_types.csv",
+ "format" = "csv",
+ "column_separator" = ","
+ );
+ ```
+
+- Read Parquet data from GitHub
+
+ ```sql
+ SELECT arr_map, id FROM
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/external_table_p0/tvf/t.parquet",
+ "format" = "parquet"
+ );
+ ```
+
+- Read JSON data from GitHub and use `DESC FUNCTION` to view the schema
+
+ ```sql
+ DESC FUNCTION
+ HTTP(
+ "uri" =
"https://raw.githubusercontent.com/apache/doris/refs/heads/master/regression-test/data/load_p0/stream_load/basic_data.json",
+ "format" = "json",
+ "strip_outer_array" = "true"
+ );
+ ```
+### Querying HTTP API Endpoints
+
+You can use the HTTP table function to directly query HTTP API endpoints that
return JSON-formatted data. For example, querying a REST API that returns JSON
data:
+
+```sql
+SELECT * FROM
+HTTP(
+ "uri" = "https://api.example.com/v1/data",
+ "format" = "json",
+ "http.header.Authorization" = "Bearer your_token",
+ "strip_outer_array" = "true"
+);
+```
+
+:::tip
+For HTTP API endpoints that do not support Range Requests, the system will
automatically fall back to non-Range Request mode. You can manually disable
Range Requests via the `http.enable.range.request` parameter.
+:::
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]