morrySnow commented on code in PR #1828: URL: https://github.com/apache/doris-website/pull/1828#discussion_r1918675262
########## docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md: ########## @@ -24,27 +24,31 @@ specific language governing permissions and limitations under the License. --> -## `mv_infos` - -### Name - -mv_infos - -### description +## Description Table function, generating temporary tables for asynchronous materialized views, which can view information about asynchronous materialized views created in a certain database. This function is used in the from clause. Review Comment: 这句也删掉 ########## docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md: ########## @@ -61,42 +65,41 @@ mysql> desc function mv_infos("database"="tpch100"); | MvPartitionInfo | TEXT | No | false | NULL | NONE | | SyncWithBaseTables | BOOLEAN | No | false | NULL | NONE | +--------------------+---------+------+-------+---------+-------+ -12 rows in set (0.01 sec) ``` -* Id: Materialized View ID -* Name: Materialized View Name -* JobName: The job name corresponding to the materialized view -* State: Materialized View State -* SchemaChangeDetail: The reason why the materialized view State becomes a SchemeChange -* RefreshState: Materialized view refresh status -* RefreshInfo: Refreshing strategy information defined by materialized views -* QuerySql: Query statements defined by materialized views -* EnvInfo: Environmental information during the creation of materialized views -* MvProperties: Materialized visual attributes -* MvPartitionInfo: Partition information of materialized views -* SyncWithBaseTables:Is it synchronized with the base table data? To see which partition is not synchronized, please use [SHOW PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) - -### example - -1. View all materialized views under db1 - -```sql -mysql> select * from mv_infos("database"="db1"); -``` - -2. View the materialized view named mv1 under db1 - -```sql -mysql> select * from mv_infos("database"="db1") where Name = "mv1"; -``` - -3. View the status of the materialized view named mv1 under db1 - -```sql -mysql> select State from mv_infos("database"="db1") where Name = "mv1"; -``` - -### keywords - - mv, infos +The meaning of the fields is as follows: + +| Field | Type | Description | +|------------------------|---------|---------------------------------------------------------------------| +| Id | BIGINT | Materialized view ID | +| Name | TEXT | Materialized view name | +| JobName | TEXT | Job name corresponding to the materialized view | +| State | TEXT | State of the materialized view | +| SchemaChangeDetail | TEXT | Reason for the state change to SchemaChange | +| RefreshState | TEXT | Refresh state of the materialized view | +| RefreshInfo | TEXT | Refresh strategy information defined for the materialized view | +| QuerySql | TEXT | SQL query defined for the materialized view | +| EnvInfo | TEXT | Environment information when the materialized view was created | +| MvProperties | TEXT | Materialized view properties | +| MvPartitionInfo | TEXT | Partition information of the materialized view | +| SyncWithBaseTables | BOOLEAN | Whether the data is synchronized with the base table. To check which partition is not synchronized, use [SHOW PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) | + +## Examples + +- View all materialized views under db1 Review Comment: 不用无序列表,直接文字描述就可以 ########## docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md: ########## @@ -24,27 +24,31 @@ specific language governing permissions and limitations under the License. --> -## `mv_infos` - -### Name - -mv_infos - -### description +## Description Table function, generating temporary tables for asynchronous materialized views, which can view information about asynchronous materialized views created in a certain database. This function is used in the from clause. This funciton is supported since 2.1.0. Review Comment: 这句从2.1.0开始支持 删掉 ########## docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md: ########## @@ -61,42 +65,41 @@ mysql> desc function mv_infos("database"="tpch100"); | MvPartitionInfo | TEXT | No | false | NULL | NONE | | SyncWithBaseTables | BOOLEAN | No | false | NULL | NONE | +--------------------+---------+------+-------+---------+-------+ -12 rows in set (0.01 sec) ``` -* Id: Materialized View ID -* Name: Materialized View Name -* JobName: The job name corresponding to the materialized view -* State: Materialized View State -* SchemaChangeDetail: The reason why the materialized view State becomes a SchemeChange -* RefreshState: Materialized view refresh status -* RefreshInfo: Refreshing strategy information defined by materialized views -* QuerySql: Query statements defined by materialized views -* EnvInfo: Environmental information during the creation of materialized views -* MvProperties: Materialized visual attributes -* MvPartitionInfo: Partition information of materialized views -* SyncWithBaseTables:Is it synchronized with the base table data? To see which partition is not synchronized, please use [SHOW PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) - -### example - -1. View all materialized views under db1 - -```sql -mysql> select * from mv_infos("database"="db1"); -``` - -2. View the materialized view named mv1 under db1 - -```sql -mysql> select * from mv_infos("database"="db1") where Name = "mv1"; -``` - -3. View the status of the materialized view named mv1 under db1 - -```sql -mysql> select State from mv_infos("database"="db1") where Name = "mv1"; -``` - -### keywords - - mv, infos +The meaning of the fields is as follows: + +| Field | Type | Description | +|------------------------|---------|---------------------------------------------------------------------| +| Id | BIGINT | Materialized view ID | +| Name | TEXT | Materialized view name | +| JobName | TEXT | Job name corresponding to the materialized view | +| State | TEXT | State of the materialized view | +| SchemaChangeDetail | TEXT | Reason for the state change to SchemaChange | +| RefreshState | TEXT | Refresh state of the materialized view | +| RefreshInfo | TEXT | Refresh strategy information defined for the materialized view | +| QuerySql | TEXT | SQL query defined for the materialized view | +| EnvInfo | TEXT | Environment information when the materialized view was created | +| MvProperties | TEXT | Materialized view properties | +| MvPartitionInfo | TEXT | Partition information of the materialized view | +| SyncWithBaseTables | BOOLEAN | Whether the data is synchronized with the base table. To check which partition is not synchronized, use [SHOW PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) | + +## Examples + +- View all materialized views under db1 + + ```sql + mysql> select * from mv_infos("database"="db1"); Review Comment: Example里面不能有prompt ```suggestion elect * from mv_infos("database"="db1"); ``` ########## docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md: ########## @@ -24,27 +24,31 @@ specific language governing permissions and limitations under the License. --> -## `mv_infos` - -### Name - -mv_infos - -### description +## Description Table function, generating temporary tables for asynchronous materialized views, which can view information about asynchronous materialized views created in a certain database. This function is used in the from clause. This funciton is supported since 2.1.0. -#### syntax +## Syntax +```sql +MV_INFOS("database"="<database>") +``` + +## Required Parameters +**`<database>`** +> Specify the cluster database name to be queried -`mv_infos("database"="")` -mv_infos() Table structure: +## Return Value + +View mv_infos() Table schema: ```sql -mysql> desc function mv_infos("database"="tpch100"); Review Comment: 这块删掉,直接保留下面的field介绍就可以了 ########## docs/sql-manual/sql-functions/table-valued-functions/hdfs.md: ########## @@ -24,139 +24,117 @@ specific language governing permissions and limitations under the License. --> -## HDFS - -### Name - -hdfs - -### Description +## Description HDFS table-valued-function(tvf), allows users to read and access file contents on S3-compatible object storage, just like accessing relational table. Currently supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format. -#### syntax +## Syntax ```sql -hdfs( - "uri" = "..", - "fs.defaultFS" = "...", - "hadoop.username" = "...", - "format" = "csv", - "keyn" = "valuen" - ... +HDFS( + "uri" = "<uri>", + "fs.defaultFS" = "<fs_defaultFS>", + "hadoop.username" = "<hadoop_username>", + "format" = "<format>", + [, "<optional_property_key>" = "<optional_property_value>" [, ...] ] ); ``` -**parameter description** - -Related parameters for accessing hdfs: - -- `uri`: (required) hdfs uri. If the uri path does not exist or the files are empty files, hdfs tvf will return an empty result set. -- `fs.defaultFS`: (required) -- `hadoop.username`: (required) Can be any string, but cannot be empty. -- `hadoop.security.authentication`: (optional) -- `hadoop.username`: (optional) -- `hadoop.kerberos.principal`: (optional) -- `hadoop.kerberos.keytab`: (optional) -- `dfs.client.read.shortcircuit`: (optional) -- `dfs.domain.socket.path`: (optional) - -Related parameters for accessing HDFS in HA mode: - -- `dfs.nameservices`: (optional) -- `dfs.ha.namenodes.your-nameservices`: (optional) -- `dfs.namenode.rpc-address.your-nameservices.your-namenode`: (optional) -- `dfs.client.failover.proxy.provider.your-nameservices`: (optional) - -File format parameters: - -- `format`: (required) Currently support `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro` -- `column_separator`: (optional) default `\t`. -- `line_delimiter`: (optional) default `\n`. -- `compress_type`: (optional) Currently support `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default value is `UNKNOWN`, it will automatically infer the type based on the suffix of `uri`. - - The following 6 parameters are used for loading in json format. For specific usage methods, please refer to: [Json Load](../../../data-operate/import/import-way/load-json-format.md) - -- `read_json_by_line`: (optional) default `"true"` -- `strip_outer_array`: (optional) default `"false"` -- `json_root`: (optional) default `""` -- `json_paths`: (optional) default `""` -- `num_as_string`: (optional) default `false` -- `fuzzy_parse`: (optional) default `false` - - The following 2 parameters are used for loading in csv format: - -- `trim_double_quotes`: Boolean type (optional), the default value is `false`. True means that the outermost double quotes of each field in the csv file are trimmed. -- `skip_lines`: Integer type (optional), the default value is 0. It will skip some lines in the head of csv file. It will be disabled when the format is `csv_with_names` or `csv_with_names_and_types`. - -other kinds of parameters: - -- `path_partition_keys`: (optional) Specifies the column names carried in the file path. For example, if the file path is /path/to/city=beijing/date="2023-07-09", you should fill in `path_partition_keys="city,date"`. It will automatically read the corresponding column names and values from the path during load process. -- `resource`:(optional)Specify the resource name. Hdfs Tvf can use the existing Hdfs resource to directly access Hdfs. You can refer to the method for creating an Hdfs resource: [CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md). This property is supported starting from version 2.1.4. - -:::tip Tip -To directly query a TVF or create a VIEW based on that TVF, you need to have usage permission for that resource. To query a VIEW created based on TVF, you only need select permission for that VIEW. -::: - -### Examples - -Read and access csv format files on hdfs storage. - -```sql -MySQL [(none)]> select * from hdfs( - "uri" = "hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv"); -+------+---------+------+ -| c1 | c2 | c3 | -+------+---------+------+ -| 1 | alice | 18 | -| 2 | bob | 20 | -| 3 | jack | 24 | -| 4 | jackson | 19 | -| 5 | liming | 18 | -+------+---------+------+ -``` - -Read and access csv format files on hdfs storage in HA mode. - -```sql -MySQL [(none)]> select * from hdfs( - "uri" = "hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv", - "dfs.nameservices" = "my_hdfs", - "dfs.ha.namenodes.my_hdfs" = "nn1,nn2", - "dfs.namenode.rpc-address.my_hdfs.nn1" = "nanmenode01:8020", - "dfs.namenode.rpc-address.my_hdfs.nn2" = "nanmenode02:8020", - "dfs.client.failover.proxy.provider.my_hdfs" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"); -+------+---------+------+ -| c1 | c2 | c3 | -+------+---------+------+ -| 1 | alice | 18 | -| 2 | bob | 20 | -| 3 | jack | 24 | -| 4 | jackson | 19 | -| 5 | liming | 18 | -+------+---------+------+ -``` - -Can be used with `desc function` : - -```sql -MySQL [(none)]> desc function hdfs( - "uri" = "hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv_with_names"); -``` - -### Keywords - - hdfs, table-valued-function, tvf - -### Best Practice - - For more detailed usage of HDFS tvf, please refer to [S3](./s3.md) tvf, The only difference between them is the way of accessing the storage system. +## Required Parameters +| Parameter | Description | +|------------------------|------------------------------------------------------------------------------------------------------------------------| +| `uri` | The URI for accessing HDFS. If the URI path does not exist or the file is empty, the HDFS TVF will return an empty set. | +| `fs.defaultFS` | The default file system URI for HDFS | +| `hadoop.username` | Required, can be any string but cannot be empty. | +| `format` | File format, required. Currently supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`. | + +## Optional Parameters +| Parameter | Description | Remarks | +|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------| +| `hadoop.security.authentication` | HDFS security authentication type | | +| `hadoop.username` | Alternative HDFS username | | +| `hadoop.kerberos.principal` | Kerberos principal | | +| `hadoop.kerberos.keytab` | Kerberos keytab | | +| `dfs.client.read.shortcircuit` | Enable short-circuit read | | +| `dfs.domain.socket.path` | Domain socket path | | +| `dfs.nameservices` | The nameservice for HA mode | | +| `dfs.ha.namenodes.your-nameservices` | Configuration for namenode in HA mode | | +| `dfs.namenode.rpc-address.your-nameservices.your-namenode` | Specify the RPC address for the namenode | | +| `dfs.client.failover.proxy.provider.your-nameservices` | Specify the proxy provider for failover | | +| `column_separator` | Column separator, default is `\t` | | +| `line_delimiter` | Line separator, default is `\n` | | +| `compress_type` | Supported types: `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default is `UNKNOWN`, and the type will be automatically inferred based on the URI suffix. | | +| `read_json_by_line` | For JSON format imports, default is `true` | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `strip_outer_array` | For JSON format imports, default is `false` | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `json_root` | For JSON format imports, default is empty | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `json_paths` | For JSON format imports, default is empty | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `num_as_string` | For JSON format imports, default is `false` | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `fuzzy_parse` | For JSON format imports, default is `false` | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `trim_double_quotes` | For CSV format imports, boolean type. Default is `false`. If `true`, removes the outermost double quotes from each field. | | +| `skip_lines` | For CSV format imports, integer type. Default is 0. Skips the first few lines of the CSV file. This parameter is ignored if `csv_with_names` or `csv_with_names_and_types` is set. | | +| `path_partition_keys` | Specify the partition column names carried in the file path, for example `/path/to/city=beijing/date="2023-07-09"`, then fill in `path_partition_keys="city,date"`, which will automatically read the corresponding column names and values from the path for import. | | +| `resource` | Specify the resource name. HDFS TVF can directly access HDFS using an existing HDFS resource. Refer to [CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md) for creating HDFS resources. | Supported from version 2.1.4 and above. | + +## Usage Notes +- To directly query a TVF or create a View based on the TVF, you need to have the **USAGE** privilege on the respective Resource. When querying a View created based on the TVF, only the **SELECT** privilege on the View is required. Review Comment: 权限部分,增加一个章节叫做 - 权限控制(Access Control Requirements) ########## docs/sql-manual/sql-functions/table-valued-functions/hdfs.md: ########## @@ -24,139 +24,117 @@ specific language governing permissions and limitations under the License. --> -## HDFS - -### Name - -hdfs - -### Description +## Description HDFS table-valued-function(tvf), allows users to read and access file contents on S3-compatible object storage, just like accessing relational table. Currently supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format. -#### syntax +## Syntax ```sql -hdfs( - "uri" = "..", - "fs.defaultFS" = "...", - "hadoop.username" = "...", - "format" = "csv", - "keyn" = "valuen" - ... +HDFS( + "uri" = "<uri>", + "fs.defaultFS" = "<fs_defaultFS>", + "hadoop.username" = "<hadoop_username>", + "format" = "<format>", + [, "<optional_property_key>" = "<optional_property_value>" [, ...] ] ); ``` -**parameter description** - -Related parameters for accessing hdfs: - -- `uri`: (required) hdfs uri. If the uri path does not exist or the files are empty files, hdfs tvf will return an empty result set. -- `fs.defaultFS`: (required) -- `hadoop.username`: (required) Can be any string, but cannot be empty. -- `hadoop.security.authentication`: (optional) -- `hadoop.username`: (optional) -- `hadoop.kerberos.principal`: (optional) -- `hadoop.kerberos.keytab`: (optional) -- `dfs.client.read.shortcircuit`: (optional) -- `dfs.domain.socket.path`: (optional) - -Related parameters for accessing HDFS in HA mode: - -- `dfs.nameservices`: (optional) -- `dfs.ha.namenodes.your-nameservices`: (optional) -- `dfs.namenode.rpc-address.your-nameservices.your-namenode`: (optional) -- `dfs.client.failover.proxy.provider.your-nameservices`: (optional) - -File format parameters: - -- `format`: (required) Currently support `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro` -- `column_separator`: (optional) default `\t`. -- `line_delimiter`: (optional) default `\n`. -- `compress_type`: (optional) Currently support `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default value is `UNKNOWN`, it will automatically infer the type based on the suffix of `uri`. - - The following 6 parameters are used for loading in json format. For specific usage methods, please refer to: [Json Load](../../../data-operate/import/import-way/load-json-format.md) - -- `read_json_by_line`: (optional) default `"true"` -- `strip_outer_array`: (optional) default `"false"` -- `json_root`: (optional) default `""` -- `json_paths`: (optional) default `""` -- `num_as_string`: (optional) default `false` -- `fuzzy_parse`: (optional) default `false` - - The following 2 parameters are used for loading in csv format: - -- `trim_double_quotes`: Boolean type (optional), the default value is `false`. True means that the outermost double quotes of each field in the csv file are trimmed. -- `skip_lines`: Integer type (optional), the default value is 0. It will skip some lines in the head of csv file. It will be disabled when the format is `csv_with_names` or `csv_with_names_and_types`. - -other kinds of parameters: - -- `path_partition_keys`: (optional) Specifies the column names carried in the file path. For example, if the file path is /path/to/city=beijing/date="2023-07-09", you should fill in `path_partition_keys="city,date"`. It will automatically read the corresponding column names and values from the path during load process. -- `resource`:(optional)Specify the resource name. Hdfs Tvf can use the existing Hdfs resource to directly access Hdfs. You can refer to the method for creating an Hdfs resource: [CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md). This property is supported starting from version 2.1.4. - -:::tip Tip -To directly query a TVF or create a VIEW based on that TVF, you need to have usage permission for that resource. To query a VIEW created based on TVF, you only need select permission for that VIEW. -::: - -### Examples - -Read and access csv format files on hdfs storage. - -```sql -MySQL [(none)]> select * from hdfs( - "uri" = "hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv"); -+------+---------+------+ -| c1 | c2 | c3 | -+------+---------+------+ -| 1 | alice | 18 | -| 2 | bob | 20 | -| 3 | jack | 24 | -| 4 | jackson | 19 | -| 5 | liming | 18 | -+------+---------+------+ -``` - -Read and access csv format files on hdfs storage in HA mode. - -```sql -MySQL [(none)]> select * from hdfs( - "uri" = "hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv", - "dfs.nameservices" = "my_hdfs", - "dfs.ha.namenodes.my_hdfs" = "nn1,nn2", - "dfs.namenode.rpc-address.my_hdfs.nn1" = "nanmenode01:8020", - "dfs.namenode.rpc-address.my_hdfs.nn2" = "nanmenode02:8020", - "dfs.client.failover.proxy.provider.my_hdfs" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"); -+------+---------+------+ -| c1 | c2 | c3 | -+------+---------+------+ -| 1 | alice | 18 | -| 2 | bob | 20 | -| 3 | jack | 24 | -| 4 | jackson | 19 | -| 5 | liming | 18 | -+------+---------+------+ -``` - -Can be used with `desc function` : - -```sql -MySQL [(none)]> desc function hdfs( - "uri" = "hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv_with_names"); -``` - -### Keywords - - hdfs, table-valued-function, tvf - -### Best Practice - - For more detailed usage of HDFS tvf, please refer to [S3](./s3.md) tvf, The only difference between them is the way of accessing the storage system. +## Required Parameters Review Comment: 先列个表格,说明 <optional_property_key> 和 <optional_property_value> 分别是什么。 在描述 <optional_property_key> 的时候可以写上,可选的key参见下表 ########## docs/sql-manual/sql-functions/table-valued-functions/hdfs.md: ########## @@ -24,139 +24,117 @@ specific language governing permissions and limitations under the License. --> -## HDFS - -### Name - -hdfs - -### Description +## Description HDFS table-valued-function(tvf), allows users to read and access file contents on S3-compatible object storage, just like accessing relational table. Currently supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format. -#### syntax +## Syntax ```sql -hdfs( - "uri" = "..", - "fs.defaultFS" = "...", - "hadoop.username" = "...", - "format" = "csv", - "keyn" = "valuen" - ... +HDFS( + "uri" = "<uri>", + "fs.defaultFS" = "<fs_defaultFS>", + "hadoop.username" = "<hadoop_username>", + "format" = "<format>", + [, "<optional_property_key>" = "<optional_property_value>" [, ...] ] ); ``` -**parameter description** - -Related parameters for accessing hdfs: - -- `uri`: (required) hdfs uri. If the uri path does not exist or the files are empty files, hdfs tvf will return an empty result set. -- `fs.defaultFS`: (required) -- `hadoop.username`: (required) Can be any string, but cannot be empty. -- `hadoop.security.authentication`: (optional) -- `hadoop.username`: (optional) -- `hadoop.kerberos.principal`: (optional) -- `hadoop.kerberos.keytab`: (optional) -- `dfs.client.read.shortcircuit`: (optional) -- `dfs.domain.socket.path`: (optional) - -Related parameters for accessing HDFS in HA mode: - -- `dfs.nameservices`: (optional) -- `dfs.ha.namenodes.your-nameservices`: (optional) -- `dfs.namenode.rpc-address.your-nameservices.your-namenode`: (optional) -- `dfs.client.failover.proxy.provider.your-nameservices`: (optional) - -File format parameters: - -- `format`: (required) Currently support `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro` -- `column_separator`: (optional) default `\t`. -- `line_delimiter`: (optional) default `\n`. -- `compress_type`: (optional) Currently support `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default value is `UNKNOWN`, it will automatically infer the type based on the suffix of `uri`. - - The following 6 parameters are used for loading in json format. For specific usage methods, please refer to: [Json Load](../../../data-operate/import/import-way/load-json-format.md) - -- `read_json_by_line`: (optional) default `"true"` -- `strip_outer_array`: (optional) default `"false"` -- `json_root`: (optional) default `""` -- `json_paths`: (optional) default `""` -- `num_as_string`: (optional) default `false` -- `fuzzy_parse`: (optional) default `false` - - The following 2 parameters are used for loading in csv format: - -- `trim_double_quotes`: Boolean type (optional), the default value is `false`. True means that the outermost double quotes of each field in the csv file are trimmed. -- `skip_lines`: Integer type (optional), the default value is 0. It will skip some lines in the head of csv file. It will be disabled when the format is `csv_with_names` or `csv_with_names_and_types`. - -other kinds of parameters: - -- `path_partition_keys`: (optional) Specifies the column names carried in the file path. For example, if the file path is /path/to/city=beijing/date="2023-07-09", you should fill in `path_partition_keys="city,date"`. It will automatically read the corresponding column names and values from the path during load process. -- `resource`:(optional)Specify the resource name. Hdfs Tvf can use the existing Hdfs resource to directly access Hdfs. You can refer to the method for creating an Hdfs resource: [CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md). This property is supported starting from version 2.1.4. - -:::tip Tip -To directly query a TVF or create a VIEW based on that TVF, you need to have usage permission for that resource. To query a VIEW created based on TVF, you only need select permission for that VIEW. -::: - -### Examples - -Read and access csv format files on hdfs storage. - -```sql -MySQL [(none)]> select * from hdfs( - "uri" = "hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv"); -+------+---------+------+ -| c1 | c2 | c3 | -+------+---------+------+ -| 1 | alice | 18 | -| 2 | bob | 20 | -| 3 | jack | 24 | -| 4 | jackson | 19 | -| 5 | liming | 18 | -+------+---------+------+ -``` - -Read and access csv format files on hdfs storage in HA mode. - -```sql -MySQL [(none)]> select * from hdfs( - "uri" = "hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv", - "dfs.nameservices" = "my_hdfs", - "dfs.ha.namenodes.my_hdfs" = "nn1,nn2", - "dfs.namenode.rpc-address.my_hdfs.nn1" = "nanmenode01:8020", - "dfs.namenode.rpc-address.my_hdfs.nn2" = "nanmenode02:8020", - "dfs.client.failover.proxy.provider.my_hdfs" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"); -+------+---------+------+ -| c1 | c2 | c3 | -+------+---------+------+ -| 1 | alice | 18 | -| 2 | bob | 20 | -| 3 | jack | 24 | -| 4 | jackson | 19 | -| 5 | liming | 18 | -+------+---------+------+ -``` - -Can be used with `desc function` : - -```sql -MySQL [(none)]> desc function hdfs( - "uri" = "hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv", - "fs.defaultFS" = "hdfs://127.0.0.1:8424", - "hadoop.username" = "doris", - "format" = "csv_with_names"); -``` - -### Keywords - - hdfs, table-valued-function, tvf - -### Best Practice - - For more detailed usage of HDFS tvf, please refer to [S3](./s3.md) tvf, The only difference between them is the way of accessing the storage system. +## Required Parameters +| Parameter | Description | +|------------------------|------------------------------------------------------------------------------------------------------------------------| +| `uri` | The URI for accessing HDFS. If the URI path does not exist or the file is empty, the HDFS TVF will return an empty set. | +| `fs.defaultFS` | The default file system URI for HDFS | +| `hadoop.username` | Required, can be any string but cannot be empty. | +| `format` | File format, required. Currently supports `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`. | + +## Optional Parameters +| Parameter | Description | Remarks | +|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------| +| `hadoop.security.authentication` | HDFS security authentication type | | +| `hadoop.username` | Alternative HDFS username | | +| `hadoop.kerberos.principal` | Kerberos principal | | +| `hadoop.kerberos.keytab` | Kerberos keytab | | +| `dfs.client.read.shortcircuit` | Enable short-circuit read | | +| `dfs.domain.socket.path` | Domain socket path | | +| `dfs.nameservices` | The nameservice for HA mode | | +| `dfs.ha.namenodes.your-nameservices` | Configuration for namenode in HA mode | | +| `dfs.namenode.rpc-address.your-nameservices.your-namenode` | Specify the RPC address for the namenode | | +| `dfs.client.failover.proxy.provider.your-nameservices` | Specify the proxy provider for failover | | +| `column_separator` | Column separator, default is `\t` | | +| `line_delimiter` | Line separator, default is `\n` | | +| `compress_type` | Supported types: `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default is `UNKNOWN`, and the type will be automatically inferred based on the URI suffix. | | +| `read_json_by_line` | For JSON format imports, default is `true` | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `strip_outer_array` | For JSON format imports, default is `false` | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `json_root` | For JSON format imports, default is empty | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `json_paths` | For JSON format imports, default is empty | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `num_as_string` | For JSON format imports, default is `false` | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `fuzzy_parse` | For JSON format imports, default is `false` | Reference: [JSON Load](../../../data-operate/import/import-way/load-json-format.md) | +| `trim_double_quotes` | For CSV format imports, boolean type. Default is `false`. If `true`, removes the outermost double quotes from each field. | | +| `skip_lines` | For CSV format imports, integer type. Default is 0. Skips the first few lines of the CSV file. This parameter is ignored if `csv_with_names` or `csv_with_names_and_types` is set. | | +| `path_partition_keys` | Specify the partition column names carried in the file path, for example `/path/to/city=beijing/date="2023-07-09"`, then fill in `path_partition_keys="city,date"`, which will automatically read the corresponding column names and values from the path for import. | | +| `resource` | Specify the resource name. HDFS TVF can directly access HDFS using an existing HDFS resource. Refer to [CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md) for creating HDFS resources. | Supported from version 2.1.4 and above. | + +## Usage Notes +- To directly query a TVF or create a View based on the TVF, you need to have the **USAGE** privilege on the respective Resource. When querying a View created based on the TVF, only the **SELECT** privilege on the View is required. + +- For more detailed usage of HDFS TVF, please refer to the [S3](./s3.md) TVF, the only difference being the method of accessing the storage system. Review Comment: 这个参见S3的注意事项删了吧 ########## docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md: ########## @@ -61,42 +65,41 @@ mysql> desc function mv_infos("database"="tpch100"); | MvPartitionInfo | TEXT | No | false | NULL | NONE | | SyncWithBaseTables | BOOLEAN | No | false | NULL | NONE | +--------------------+---------+------+-------+---------+-------+ -12 rows in set (0.01 sec) ``` -* Id: Materialized View ID -* Name: Materialized View Name -* JobName: The job name corresponding to the materialized view -* State: Materialized View State -* SchemaChangeDetail: The reason why the materialized view State becomes a SchemeChange -* RefreshState: Materialized view refresh status -* RefreshInfo: Refreshing strategy information defined by materialized views -* QuerySql: Query statements defined by materialized views -* EnvInfo: Environmental information during the creation of materialized views -* MvProperties: Materialized visual attributes -* MvPartitionInfo: Partition information of materialized views -* SyncWithBaseTables:Is it synchronized with the base table data? To see which partition is not synchronized, please use [SHOW PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) - -### example - -1. View all materialized views under db1 - -```sql -mysql> select * from mv_infos("database"="db1"); -``` - -2. View the materialized view named mv1 under db1 - -```sql -mysql> select * from mv_infos("database"="db1") where Name = "mv1"; -``` - -3. View the status of the materialized view named mv1 under db1 - -```sql -mysql> select State from mv_infos("database"="db1") where Name = "mv1"; -``` - -### keywords - - mv, infos +The meaning of the fields is as follows: + +| Field | Type | Description | +|------------------------|---------|---------------------------------------------------------------------| +| Id | BIGINT | Materialized view ID | +| Name | TEXT | Materialized view name | +| JobName | TEXT | Job name corresponding to the materialized view | +| State | TEXT | State of the materialized view | +| SchemaChangeDetail | TEXT | Reason for the state change to SchemaChange | +| RefreshState | TEXT | Refresh state of the materialized view | +| RefreshInfo | TEXT | Refresh strategy information defined for the materialized view | +| QuerySql | TEXT | SQL query defined for the materialized view | +| EnvInfo | TEXT | Environment information when the materialized view was created | +| MvProperties | TEXT | Materialized view properties | +| MvPartitionInfo | TEXT | Partition information of the materialized view | +| SyncWithBaseTables | BOOLEAN | Whether the data is synchronized with the base table. To check which partition is not synchronized, use [SHOW PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) | + +## Examples + +- View all materialized views under db1 + + ```sql + mysql> select * from mv_infos("database"="db1"); Review Comment: example都要有对应的执行结果示例 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org