This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new db80571adc9 [fix] Fix typo of admin manual (#2264) db80571adc9 is described below commit db80571adc97f5300349566c84f14c72ad95f98e Author: KassieZ <139741991+kass...@users.noreply.github.com> AuthorDate: Mon Apr 7 17:01:45 2025 +0800 [fix] Fix typo of admin manual (#2264) ## Versions - [ ] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [ ] Chinese - [ ] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- docs/admin-manual/log-management/fe-log.md | 2 +- .../current/admin-manual/log-management/fe-log.md | 2 +- .../admin-manual/log-management/fe-log.md | 2 +- .../admin-manual/log-management/fe-log.md | 2 +- .../admin-manual/log-management/fe-log.md | 2 +- .../admin-manual/log-management/fe-log.md | 2 +- .../sql-functions/table-valued-functions/s3.md | 240 ++++++++++----------- 7 files changed, 126 insertions(+), 126 deletions(-) diff --git a/docs/admin-manual/log-management/fe-log.md b/docs/admin-manual/log-management/fe-log.md index 18fcc599c1b..03e0f77cf0f 100644 --- a/docs/admin-manual/log-management/fe-log.md +++ b/docs/admin-manual/log-management/fe-log.md @@ -67,7 +67,7 @@ The following configuration items are configured in the `fe.conf` file. | `sys_log_enable_compress` | false | true, false | Whether to enable compression for historical `fe.log` and `fe.warn.log` logs. Default is off. When enabled, historical audit logs will be archived using gzip compression. | | `log_rollover_strategy` | `age` | `age`, `size` | Log retention strategy, default is `age`, which retains historical logs based on time. `size` retains historical logs based on log size. | | `sys_log_delete_age` | 7d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.log` and `fe.warn.log` files. Default is 7 days. Logs older than 7 days will be automatically deleted. | -| `audit_log_delete_age` | 7d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.audit.log` files. Default is 30 days. Logs older than 30 days will be automatically deleted. | +| `audit_log_delete_age` | 30d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.audit.log` files. Default is 30 days. Logs older than 30 days will be automatically deleted. | | `info_sys_accumulated_file_size` | 4 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.log` files. Default is 4GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | | `warn_sys_accumulated_file_size` | 2 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.warn.log` files. Default is 2GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | | `audit_sys_accumulated_file_size` | 4 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.audit.log` files. Default is 4GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/log-management/fe-log.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/log-management/fe-log.md index 28a1cf48351..b95ff59befc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/log-management/fe-log.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/log-management/fe-log.md @@ -67,7 +67,7 @@ under the License. | `sys_log_enable_compress` | false | true, false | 是否开启历史 `fe.log` 和 `fe.warn.log` 日志压缩。默认关闭。开启后,历史审计日志会使用 gzip 压缩归档 | | `log_rollover_strategy` | `age` | `age`, `size` | 日志保留策略,默认为 `age`,即根据时间策略保留历史日志。`size` 为按日志大小保留历史日志 | | `sys_log_delete_age` | 7d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.log` 和 `fe.warn.log` 文件的保留天数。默认 7 天。会自动删除 7 天前的日志 | -| `audit_log_delete_age` | 7d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.audit.log` 文件的保留天数。默认 30 天。会自动删除 30 天前的日志 | +| `audit_log_delete_age` | 30d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.audit.log` 文件的保留天数。默认 30 天。会自动删除 30 天前的日志 | | `info_sys_accumulated_file_size` | 4 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.log` 文件的累计大小。默认为 4GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | | `warn_sys_accumulated_file_size` | 2 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.warn.log` 文件的累计大小。默认为 2GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | | `audit_sys_accumulated_file_size` | 4 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.audit.log` 文件的累计大小。默认为 4GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/log-management/fe-log.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/log-management/fe-log.md index 0f44070e3d3..a12ba030641 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/log-management/fe-log.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/admin-manual/log-management/fe-log.md @@ -67,7 +67,7 @@ under the License. | `sys_log_enable_compress` | false | true, false | 是否开启历史 `fe.log` 和 `fe.warn.log` 日志压缩。默认关闭。开启后,历史审计日志会使用 gzip 压缩归档 | | `log_rollover_strategy` | `age` | `age`, `size` | 日志保留策略,默认为 `age`,即根据时间策略保留历史日志。`size` 为按日志大小保留历史日志 | | `sys_log_delete_age` | 7d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.log` 和 `fe.warn.log` 文件的保留天数。默认 7 天。会自动删除 7 天前的日志 | -| `audit_log_delete_age` | 7d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.audit.log` 文件的保留天数。默认 30 天。会自动删除 30 天前的日志 | +| `audit_log_delete_age` | 30d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.audit.log` 文件的保留天数。默认 30 天。会自动删除 30 天前的日志 | | `info_sys_accumulated_file_size` | 4 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.log` 文件的累计大小。默认为 4GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | | `warn_sys_accumulated_file_size` | 2 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.warn.log` 文件的累计大小。默认为 2GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | | `audit_sys_accumulated_file_size` | 4 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.audit.log` 文件的累计大小。默认为 4GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/log-management/fe-log.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/log-management/fe-log.md index 0f44070e3d3..a12ba030641 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/log-management/fe-log.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/admin-manual/log-management/fe-log.md @@ -67,7 +67,7 @@ under the License. | `sys_log_enable_compress` | false | true, false | 是否开启历史 `fe.log` 和 `fe.warn.log` 日志压缩。默认关闭。开启后,历史审计日志会使用 gzip 压缩归档 | | `log_rollover_strategy` | `age` | `age`, `size` | 日志保留策略,默认为 `age`,即根据时间策略保留历史日志。`size` 为按日志大小保留历史日志 | | `sys_log_delete_age` | 7d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.log` 和 `fe.warn.log` 文件的保留天数。默认 7 天。会自动删除 7 天前的日志 | -| `audit_log_delete_age` | 7d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.audit.log` 文件的保留天数。默认 30 天。会自动删除 30 天前的日志 | +| `audit_log_delete_age` | 30d | 支持格式如 7d, 10h, 60m, 120s | 仅当 `log_rollover_strategy` 为 `age` 时生效。控制 `fe.audit.log` 文件的保留天数。默认 30 天。会自动删除 30 天前的日志 | | `info_sys_accumulated_file_size` | 4 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.log` 文件的累计大小。默认为 4GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | | `warn_sys_accumulated_file_size` | 2 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.warn.log` 文件的累计大小。默认为 2GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | | `audit_sys_accumulated_file_size` | 4 | | 仅当 `log_rollover_strategy` 为 `size` 时生效。控制 `fe.audit.log` 文件的累计大小。默认为 4GB。当累计日志大小超过这个阈值后,会删除历史日志文件 | diff --git a/versioned_docs/version-2.1/admin-manual/log-management/fe-log.md b/versioned_docs/version-2.1/admin-manual/log-management/fe-log.md index 18fcc599c1b..03e0f77cf0f 100644 --- a/versioned_docs/version-2.1/admin-manual/log-management/fe-log.md +++ b/versioned_docs/version-2.1/admin-manual/log-management/fe-log.md @@ -67,7 +67,7 @@ The following configuration items are configured in the `fe.conf` file. | `sys_log_enable_compress` | false | true, false | Whether to enable compression for historical `fe.log` and `fe.warn.log` logs. Default is off. When enabled, historical audit logs will be archived using gzip compression. | | `log_rollover_strategy` | `age` | `age`, `size` | Log retention strategy, default is `age`, which retains historical logs based on time. `size` retains historical logs based on log size. | | `sys_log_delete_age` | 7d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.log` and `fe.warn.log` files. Default is 7 days. Logs older than 7 days will be automatically deleted. | -| `audit_log_delete_age` | 7d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.audit.log` files. Default is 30 days. Logs older than 30 days will be automatically deleted. | +| `audit_log_delete_age` | 30d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.audit.log` files. Default is 30 days. Logs older than 30 days will be automatically deleted. | | `info_sys_accumulated_file_size` | 4 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.log` files. Default is 4GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | | `warn_sys_accumulated_file_size` | 2 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.warn.log` files. Default is 2GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | | `audit_sys_accumulated_file_size` | 4 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.audit.log` files. Default is 4GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | diff --git a/versioned_docs/version-3.0/admin-manual/log-management/fe-log.md b/versioned_docs/version-3.0/admin-manual/log-management/fe-log.md index eded7d6cc83..f8da1d987a1 100644 --- a/versioned_docs/version-3.0/admin-manual/log-management/fe-log.md +++ b/versioned_docs/version-3.0/admin-manual/log-management/fe-log.md @@ -67,7 +67,7 @@ The following configuration items are configured in the `fe.conf` file. | `sys_log_enable_compress` | false | true, false | Whether to enable compression for historical `fe.log` and `fe.warn.log` logs. Default is off. When enabled, historical audit logs will be archived using gzip compression. | | `log_rollover_strategy` | `age` | `age`, `size` | Log retention strategy, default is `age`, which retains historical logs based on time. `size` retains historical logs based on log size. | | `sys_log_delete_age` | 7d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.log` and `fe.warn.log` files. Default is 7 days. Logs older than 7 days will be automatically deleted. | -| `audit_log_delete_age` | 7d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.audit.log` files. Default is 30 days. Logs older than 30 days will be automatically deleted. | +| `audit_log_delete_age` | 30d | Supports formats like 7d, 10h, 60m, 120s | Only effective when `log_rollover_strategy` is `age`. Controls the number of days to retain `fe.audit.log` files. Default is 30 days. Logs older than 30 days will be automatically deleted. | | `info_sys_accumulated_file_size` | 4 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.log` files. Default is 4GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | | `warn_sys_accumulated_file_size` | 2 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.warn.log` files. Default is 2GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | | `audit_sys_accumulated_file_size` | 4 | | Only effective when `log_rollover_strategy` is `size`. Controls the cumulative size of `fe.audit.log` files. Default is 4GB. When the cumulative log size exceeds this threshold, historical log files will be deleted. | diff --git a/versioned_docs/version-3.0/sql-manual/sql-functions/table-valued-functions/s3.md b/versioned_docs/version-3.0/sql-manual/sql-functions/table-valued-functions/s3.md index a13aa8abdb1..d1e92bf03b0 100644 --- a/versioned_docs/version-3.0/sql-manual/sql-functions/table-valued-functions/s3.md +++ b/versioned_docs/version-3.0/sql-manual/sql-functions/table-valued-functions/s3.md @@ -1,6 +1,6 @@ --- { - "title": "LOCAL", + "title": "S3", "language": "en" } --- @@ -26,138 +26,138 @@ under the License. ## Description -Local table-valued-function(tvf), allows users to read and access local file contents on be node, just like accessing relational table. Currently supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format. +S3 table-valued-function (tvf) allows users to read and access file contents on S3-compatible object storage just like accessing relational table-formatted data. Currently supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format. -## syntax +## Syntax ```sql -LOCAL( - "file_path" = "<file_path>", - "backend_id" = "<backend_id>", - "format" = "<format>" - [, "<optional_property_key>" = "<optional_property_value>" [, ...] ] - ); +S3( + "uri" = "<uri>", + "s3.access_key" = "<s3_access_key>", + "s3.secret_key" = "<s3_secret_key>", + "s3.region" = "<s3_region>", + "s3.endpoint" = "<s3_endpoint>", + "format" = "<format>" + [, "<optional_property_key>" = "<optional_property_value>" [, ...] ] + ) ``` -## Required Parameters -| Parameter | Description | Remarks | -|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------| -| `file_path` | The path of the file to be read, which is relative to the `user_files_secure_path` directory. The `user_files_secure_path` parameter is a [BE configuration item](../../../admin-manual/config/be-config.md). <br /> The path cannot include `..`, and glob syntax can be used for pattern matching, such as `logs/*.log`. | | -| `backend_id` | The ID of the BE node where the file is located. It can be obtained via the `show backends` command. | Before version 2.1.1, Doris only supports specifying a BE node to read local data files on that node. | -| `format` | The file format, which is required. Supported formats are `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc`. | | +## Required parameters +| Parameter | Description | +|-------------------|-----------------------------------------------------------------------------------------------------------------------------| +| uri | The URI for accessing S3. The function will decide whether to use Path Style or Virtual-hosted Style based on the `use_path_style` parameter. | +| s3.access_key | The access key for accessing S3. | +| s3.secret_key | The secret key for accessing S3. | +| s3.region | The region where the S3 storage is located. | +| s3.endpoint | The endpoint address of the S3 storage. | +| format | The file format, supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc`. | + + +## Optional parameters +| Parameter | Description | Remarks | +|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| +| s3.session_token | S3 session token | | +| use_path_style | Default is `false`. The S3 SDK uses Virtual-hosted Style by default. However, some object storage systems may not support Virtual-hosted Style. In such cases, the `use_path_style` parameter can be added to force the use of Path Style. For example, MinIO only allows Path Style by default, so you need to add `use_path_style=true` when accessing MinIO. | | +| force_parsing_by_standard_uri | Default is `false`. Forces the parsing of non-standard URIs into standard URIs. | | +| column_separator | Column separator, default is `\t`. | | +| line_delimiter | Line separator, default is `\n`. | | +| compress_type | Currently UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK is supported. The default value is UNKNOWN, the type will be automatically inferred based on the suffix of `uri` | | +| read_json_by_line | Default is `"true"`, used for importing JSON format. | [JSON Load](../../../data-operate/import/file-format/json) | +| strip_outer_array | Default is `"false"`, used for importing JSON format. | [JSON Load](../../../data-operate/import/file-format/json) | +| json_root | Default is empty, used for importing JSON format. | [JSON Load](../../../data-operate/import/file-format/json) | +| jsonpaths | Default is empty, used for importing JSON format. | [JSON Load](../../../data-operate/import/file-format/json) | +| num_as_string | Default is `false`, used for importing JSON format. | [JSON Load](../../../data-operate/import/file-format/json) | +| fuzzy_parse | Default is `false`, used for importing JSON format. | | +| trim_double_quotes | Default is `false`, used for importing CSV format, trims the outer double quotes of each field. | | +| skip_lines | Default is 0, indicating how many lines to skip at the beginning of the CSV file. This is invalid for `csv_with_names` or `csv_with_names_and_types` formats. | | +| path_partition_keys | Specifies partition column names in the file path, e.g., `/path/to/city=beijing/date="2023-07-09"`, then fill `path_partition_keys="city,date"`, which will automatically read the corresponding column names and values from the path for import. | | +| resource | Specifies the Resource name. The S3 TVF can directly access S3 using an existing S3 Resource. To create an S3 Resource, refer to [CREATE-RESOURCE](../../sql-statements/cluster-management/compute-management/CREATE-RESOURCE). This feature is supported starting from version 2.1.4. | | + +## Notes +> 1. For AWS S3, the standard URI styles are as follows: +>> 1. AWS Client Style (Hadoop S3 Style): `s3://my-bucket/path/to/file?versionId=abc123&partNumber=77&partNumber=88`. +>> 2. Virtual Host Style: `https://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`. +>> 3. Path Style: `https://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`. +>> +>> In addition to supporting the above three common standard URI styles, other URI styles (perhaps uncommon, but possible) are also supported: +>> 1. Virtual Host AWS Client (Hadoop S3) Mixed Style: + >> `s3://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88` +> > 2. Path AWS Client (Hadoop S3) Mixed Style: + > > `s3://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88` +>> +>> Detailed usage examples can be found in the following examples. +> +> 2. To directly query a TVF or create a View based on the TVF, you need to have the USAGE permission for that Resource. To query a View created based on the TVF, only the SELECT permission for that View is required. -## Optional Parameters -| Parameter | Description | Remarks | -|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------| -| `shared_storage` | Defaults to false. If true, the specified file is located on shared storage (e.g., NAS). The shared storage must support POSIX file interfaces and be mounted on all BE nodes. <br /> When `shared_storage` is true, `backend_id` can be omitted. Doris may utilize all BE nodes to access the data. If `backend_id` is set, the data will be accessed only on the specified BE node. | Supported starting from version 2.1.2 | -| `column_separator` | The column separator, optional, defaults to `\t`. | | -| `line_delimiter` | The line delimiter, optional, defaults to `\n`. | | -| `compress_type` | The compression type, optional. Supported types are `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Defaults to `UNKNOWN`, and the type will be automatically inferred from the `uri` suffix. | | -| `read_json_by_line` | For JSON format imports, optional, defaults to `true`. | Refer to: [Json Load](../../../data-operate/import/file-format/json) | -| `strip_outer_array` | For JSON format imports, optional, defaults to `false`. | Refer to: [Json Load](../../../data-operate/import/file-format/json) | -| `json_root` | For JSON format imports, optional, defaults to empty. | Refer to: [Json Load](../../../data-operate/import/file-format/json) | -| `json_paths` | For JSON format imports, optional, defaults to empty. | Refer to: [Json Load](../../../data-operate/import/file-format/json) | -| `num_as_string` | For JSON format imports, optional, defaults to `false`. | Refer to: [Json Load](../../../data-operate/import/file-format/json) | -| `fuzzy_parse` | For JSON format imports, optional, defaults to `false`. | Refer to: [Json Load](../../../data-operate/import/file-format/json) | -| `trim_double_quotes` | For CSV format imports, optional, defaults to `false`. If true, it will trim the outermost double quotes around each field in the CSV file. | For CSV format | -| `skip_lines` | For CSV format imports, optional, defaults to `0`, which means skipping the first few lines of the CSV file. When the format is `csv_with_names` or `csv_with_names_and_types`, this parameter is ignored. | For CSV format | -| `path_partition_keys` | Optional, specifies the partition column names carried in the file path, e.g., `/path/to/city=beijing/date="2023-07-09"`, then fill in `path_partition_keys="city,date"`. This will automatically read the corresponding column names and values from the path for import. | | - - -## Access Control Requirements -| Privilege | Object | Notes | -| :--------- |:-------|:------| -| ADMIN_PRIV | global | | - - -## Usage Notes - -- For more detailed usage of local tvf, please refer to [S3](./s3.md) tvf, The only difference between them is the way of accessing the storage system. - -- Access data on NAS through local tvf - - NAS shared storage allows to be mounted to multiple nodes at the same time. Each node can access files in the shared storage just like local files. Therefore, the NAS can be thought of as a local file system, accessed through local tvf. +## Examples - When setting `"shared_storage" = "true"`, Doris will think that the specified file can be accessed from any BE node. When a set of files is specified using wildcards, Doris will distribute requests to access files to multiple BE nodes, so that multiple nodes can be used to perform distributed file scanning and improve query performance. +- Read and access CSV format files on S3-compatible object storage + ```sql + select * from s3("uri" = "http://127.0.0.1:9312/test2/student1.csv", + "s3.access_key"= "minioadmin", + "s3.secret_key" = "minioadmin", + "s3.endpoint" = "http://127.0.0.1:9312", + "s3.region" = "us-east-1", + "format" = "csv", + "use_path_style" = "true") order by c1; + ``` -## Examples +- Can be used with `desc function` -Analyze the log file on specified BE: -```sql -select * from local( - "file_path" = "log/be.out", - "backend_id" = "10006", - "format" = "csv") - where c1 like "%start_time%" limit 10; -``` -```text -+--------------------------------------------------------+ -| c1 | -+--------------------------------------------------------+ -| start time: 2023 年 08 月 07 日 星期一 23:20:32 CST | -| start time: 2023 年 08 月 07 日 星期一 23:32:10 CST | -| start time: 2023 年 08 月 08 日 星期二 00:20:50 CST | -| start time: 2023 年 08 月 08 日 星期二 00:29:15 CST | -+--------------------------------------------------------+ -``` + ```sql + Desc function s3("uri" = "http://127.0.0.1:9312/test2/student1.csv", + "s3.access_key"= "minioadmin", + "s3.secret_key" = "minioadmin", + "s3.endpoint" = "http://127.0.0.1:9312", + "s3.region" = "us-east-1", + "format" = "csv", + "use_path_style" = "true"); + ``` -Read and access csv format files located at path `${DORIS_HOME}/student.csv`: -```sql -select * from local( - "file_path" = "student.csv", - "backend_id" = "10003", - "format" = "csv"); -``` -```text -+------+---------+--------+ -| c1 | c2 | c3 | -+------+---------+--------+ -| 1 | alice | 18 | -| 2 | bob | 20 | -| 3 | jack | 24 | -| 4 | jackson | 19 | -| 5 | liming | d18 | -+------+---------+--------+ -```--+---------+--------+ -``` +- **Usage of different uri schemas** -Query files on NAS: -```sql -select * from local( - "file_path" = "/mnt/doris/prefix_*.txt", - "format" = "csv", - "column_separator" =",", - "shared_storage" = "true"); -``` -```text -+------+------+------+ -| c1 | c2 | c3 | -+------+------+------+ -| 1 | 2 | 3 | -| 1 | 2 | 3 | -| 1 | 2 | 3 | -| 1 | 2 | 3 | -| 1 | 2 | 3 | -+------+------+------+ -``` + Example of http:// , https:// -Can be used with `desc function` : -```sql -desc function local( - "file_path" = "student.csv", - "backend_id" = "10003", - "format" = "csv"); -``` -```text -+-------+------+------+-------+---------+-------+ -| Field | Type | Null | Key | Default | Extra | -+-------+------+------+-------+---------+-------+ -| c1 | TEXT | Yes | false | NULL | NONE | -| c2 | TEXT | Yes | false | NULL | NONE | -| c3 | TEXT | Yes | false | NULL | NONE | -+-------+------+------+-------+---------+-------+ -``` "s3.endpoint" = "cos.ap-hongkong.myqcloud.com", + ```sql + -- Note how to write your bucket of URI and set the 'use_path_style' parameter, as well as http://. + -- Because of "use_path_style"="true", s3 will be accessed in 'path style'. + select * from s3( + "URI" = "https://endpoint/bucket/file/student.csv", + "s3.access_key"= "ak", + "s3.secret_key" = "sk", + "s3.endpoint" = "endpoint", + "s3.region" = "region", + "format" = "csv", + "use_path_style"="true"); + + -- Note how to write your bucket of URI and set the 'use_path_style' parameter, as well as http://. + -- Because of "use_path_style"="false", s3 will be accessed in 'virtual-hosted style'. + select * from s3( + "URI" = "https://bucket.endpoint/file/student.csv", + "s3.access_key"= "ak", + "s3.secret_key" = "sk", + "s3.endpoint" = "endpoint", + "s3.region" = "region", + "format" = "csv", + "use_path_style"="false"); + + -- The OSS on Alibaba Cloud and The COS on Tencent Cloud will use 'virtual-hosted style' to access s3. + -- OSS + select * from s3( + "URI" = "http://example-bucket.oss-cn-beijing.aliyuncs.com/your-folder/file.parquet", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "s3.endpoint" = "oss-cn-beijing.aliyuncs.com", + "s3.region" = "oss-cn-beijing", + "format" = "parquet", + "use_path_style" = "false"); + -- COS + select * from s3( + "URI" = "https://example-bucket.cos.ap-hongkong.myqcloud.com/your-folder/file.parquet", + "s3.access_key" = "ak", + "s3.secret_key" = "sk", + "s3.endpoint" = "cos.ap-hongkong.myqcloud.com", "s3.region" = "ap-hongkong", "format" = "parquet", "use_path_style" = "false"); --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org