Re: [PR] improve the article format of hdfs() and mv_infos() function [doris-website]

via GitHub Thu, 16 Jan 2025 06:41:43 -0800


morrySnow commented on code in PR #1828:
URL: https://github.com/apache/doris-website/pull/1828#discussion_r1918675262



##########
docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md:
##########
@@ -24,27 +24,31 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## `mv_infos`
-
-### Name
-
-mv_infos
-
-### description
+## Description
 
 Table function, generating temporary tables for asynchronous materialized 
views, which can view information about asynchronous materialized views created 
in a certain database.
 
 This function is used in the from clause.

Review Comment:
   这句也删掉



##########
docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md:
##########
@@ -61,42 +65,41 @@ mysql> desc function mv_infos("database"="tpch100");
 | MvPartitionInfo    | TEXT    | No   | false | NULL    | NONE  |
 | SyncWithBaseTables | BOOLEAN | No   | false | NULL    | NONE  |
 +--------------------+---------+------+-------+---------+-------+
-12 rows in set (0.01 sec)
 ```
 
-* Id: Materialized View ID
-* Name: Materialized View Name
-* JobName: The job name corresponding to the materialized view
-* State: Materialized View State
-* SchemaChangeDetail: The reason why the materialized view State becomes a 
SchemeChange
-* RefreshState: Materialized view refresh status
-* RefreshInfo: Refreshing strategy information defined by materialized views
-* QuerySql: Query statements defined by materialized views
-* EnvInfo: Environmental information during the creation of materialized views
-* MvProperties: Materialized visual attributes
-* MvPartitionInfo: Partition information of materialized views
-* SyncWithBaseTables：Is it synchronized with the base table data? To see which 
partition is not synchronized, please use [SHOW 
PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md)
-
-### example
-
-1. View all materialized views under db1
-
-```sql
-mysql> select * from mv_infos("database"="db1");
-```
-
-2. View the materialized view named mv1 under db1
-
-```sql
-mysql> select * from mv_infos("database"="db1") where Name = "mv1";
-```
-
-3. View the status of the materialized view named mv1 under db1
-
-```sql
-mysql> select State from mv_infos("database"="db1") where Name = "mv1";
-```
-
-### keywords
-
-    mv, infos
+The meaning of the fields is as follows:
+
+| Field                  | Type    | Description                               
                          |
+|------------------------|---------|---------------------------------------------------------------------|
+| Id                     | BIGINT  | Materialized view ID                      
                          |
+| Name                   | TEXT    | Materialized view name                    
                          |
+| JobName                | TEXT    | Job name corresponding to the 
materialized view                      |
+| State                  | TEXT    | State of the materialized view            
                           |
+| SchemaChangeDetail     | TEXT    | Reason for the state change to 
SchemaChange                         |
+| RefreshState           | TEXT    | Refresh state of the materialized view    
                           |
+| RefreshInfo            | TEXT    | Refresh strategy information defined for 
the materialized view       |
+| QuerySql               | TEXT    | SQL query defined for the materialized 
view                          |
+| EnvInfo                | TEXT    | Environment information when the 
materialized view was created       |
+| MvProperties           | TEXT    | Materialized view properties              
                           |
+| MvPartitionInfo        | TEXT    | Partition information of the materialized 
view                       |
+| SyncWithBaseTables     | BOOLEAN | Whether the data is synchronized with the 
base table. To check which partition is not synchronized, use [SHOW 
PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) |
+
+## Examples
+
+- View all materialized views under db1

Review Comment:
   不用无序列表，直接文字描述就可以



##########
docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md:
##########
@@ -24,27 +24,31 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## `mv_infos`
-
-### Name
-
-mv_infos
-
-### description
+## Description
 
 Table function, generating temporary tables for asynchronous materialized 
views, which can view information about asynchronous materialized views created 
in a certain database.
 
 This function is used in the from clause.
 
 This funciton is supported since 2.1.0.

Review Comment:
   这句从2.1.0开始支持 删掉



##########
docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md:
##########
@@ -61,42 +65,41 @@ mysql> desc function mv_infos("database"="tpch100");
 | MvPartitionInfo    | TEXT    | No   | false | NULL    | NONE  |
 | SyncWithBaseTables | BOOLEAN | No   | false | NULL    | NONE  |
 +--------------------+---------+------+-------+---------+-------+
-12 rows in set (0.01 sec)
 ```
 
-* Id: Materialized View ID
-* Name: Materialized View Name
-* JobName: The job name corresponding to the materialized view
-* State: Materialized View State
-* SchemaChangeDetail: The reason why the materialized view State becomes a 
SchemeChange
-* RefreshState: Materialized view refresh status
-* RefreshInfo: Refreshing strategy information defined by materialized views
-* QuerySql: Query statements defined by materialized views
-* EnvInfo: Environmental information during the creation of materialized views
-* MvProperties: Materialized visual attributes
-* MvPartitionInfo: Partition information of materialized views
-* SyncWithBaseTables：Is it synchronized with the base table data? To see which 
partition is not synchronized, please use [SHOW 
PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md)
-
-### example
-
-1. View all materialized views under db1
-
-```sql
-mysql> select * from mv_infos("database"="db1");
-```
-
-2. View the materialized view named mv1 under db1
-
-```sql
-mysql> select * from mv_infos("database"="db1") where Name = "mv1";
-```
-
-3. View the status of the materialized view named mv1 under db1
-
-```sql
-mysql> select State from mv_infos("database"="db1") where Name = "mv1";
-```
-
-### keywords
-
-    mv, infos
+The meaning of the fields is as follows:
+
+| Field                  | Type    | Description                               
                          |
+|------------------------|---------|---------------------------------------------------------------------|
+| Id                     | BIGINT  | Materialized view ID                      
                          |
+| Name                   | TEXT    | Materialized view name                    
                          |
+| JobName                | TEXT    | Job name corresponding to the 
materialized view                      |
+| State                  | TEXT    | State of the materialized view            
                           |
+| SchemaChangeDetail     | TEXT    | Reason for the state change to 
SchemaChange                         |
+| RefreshState           | TEXT    | Refresh state of the materialized view    
                           |
+| RefreshInfo            | TEXT    | Refresh strategy information defined for 
the materialized view       |
+| QuerySql               | TEXT    | SQL query defined for the materialized 
view                          |
+| EnvInfo                | TEXT    | Environment information when the 
materialized view was created       |
+| MvProperties           | TEXT    | Materialized view properties              
                           |
+| MvPartitionInfo        | TEXT    | Partition information of the materialized 
view                       |
+| SyncWithBaseTables     | BOOLEAN | Whether the data is synchronized with the 
base table. To check which partition is not synchronized, use [SHOW 
PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) |
+
+## Examples
+
+- View all materialized views under db1
+    
+    ```sql
+    mysql> select * from mv_infos("database"="db1");

Review Comment:
   Example里面不能有prompt
   ```suggestion
   elect * from mv_infos("database"="db1");
   ```



##########
docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md:
##########
@@ -24,27 +24,31 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## `mv_infos`
-
-### Name
-
-mv_infos
-
-### description
+## Description
 
 Table function, generating temporary tables for asynchronous materialized 
views, which can view information about asynchronous materialized views created 
in a certain database.
 
 This function is used in the from clause.
 
 This funciton is supported since 2.1.0.
 
-#### syntax
+## Syntax
+```sql
+MV_INFOS("database"="<database>")
+```
+
+## Required Parameters
+**`<database>`**
+> Specify the cluster database name to be queried
 
-`mv_infos("database"="")`
 
-mv_infos() Table structure:
+## Return Value
+
+View mv_infos() Table schema:
 ```sql
-mysql> desc function mv_infos("database"="tpch100");

Review Comment:
   这块删掉，直接保留下面的field介绍就可以了



##########
docs/sql-manual/sql-functions/table-valued-functions/hdfs.md:
##########
@@ -24,139 +24,117 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## HDFS
-
-### Name
-
-hdfs
-
-### Description
+## Description
 
 HDFS table-valued-function(tvf), allows users to read and access file contents 
on S3-compatible object storage, just like accessing relational table. 
Currently supports 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format.
 
-#### syntax
+## Syntax
 
 ```sql
-hdfs(
-  "uri" = "..",
-  "fs.defaultFS" = "...",
-  "hadoop.username" = "...",
-  "format" = "csv",
-  "keyn" = "valuen" 
-  ...
+HDFS(
+    "uri" = "<uri>",
+    "fs.defaultFS" = "<fs_defaultFS>",
+    "hadoop.username" = "<hadoop_username>",
+    "format" = "<format>",
+    [, "<optional_property_key>" = "<optional_property_value>" [, ...] ]
   );
 ```
 
-**parameter description**
-
-Related parameters for accessing hdfs:
-
-- `uri`: (required) hdfs uri. If the uri path does not exist or the files are 
empty files, hdfs tvf will return an empty result set.
-- `fs.defaultFS`: (required)
-- `hadoop.username`: (required) Can be any string, but cannot be empty.
-- `hadoop.security.authentication`: (optional)
-- `hadoop.username`: (optional)
-- `hadoop.kerberos.principal`: (optional)
-- `hadoop.kerberos.keytab`: (optional)
-- `dfs.client.read.shortcircuit`: (optional)
-- `dfs.domain.socket.path`: (optional)
-
-Related parameters for accessing HDFS in HA mode:
-
-- `dfs.nameservices`: (optional)
-- `dfs.ha.namenodes.your-nameservices`: (optional)
-- `dfs.namenode.rpc-address.your-nameservices.your-namenode`: (optional)
-- `dfs.client.failover.proxy.provider.your-nameservices`: (optional)
-
-File format parameters:
-
-- `format`: (required) Currently support 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`
-- `column_separator`: (optional) default `\t`.
-- `line_delimiter`: (optional) default `\n`.
-- `compress_type`: (optional) Currently support 
`UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default value is 
`UNKNOWN`, it will automatically infer the type based on the suffix of `uri`.
-
-    The following 6 parameters are used for loading in json format. For 
specific usage methods, please refer to: [Json 
Load](../../../data-operate/import/import-way/load-json-format.md)
-
-- `read_json_by_line`: (optional) default `"true"`
-- `strip_outer_array`: (optional) default `"false"`
-- `json_root`: (optional) default `""`
-- `json_paths`: (optional) default `""`
-- `num_as_string`: (optional) default `false`
-- `fuzzy_parse`: (optional) default `false`
-
-  The following 2 parameters are used for loading in csv format:
-
-- `trim_double_quotes`: Boolean type (optional), the default value is `false`. 
True means that the outermost double quotes of each field in the csv file are 
trimmed.
-- `skip_lines`: Integer type (optional), the default value is 0. It will skip 
some lines in the head of csv file. It will be disabled when the format is 
`csv_with_names` or `csv_with_names_and_types`.
-
-other kinds of parameters:
-
-- `path_partition_keys`: (optional) Specifies the column names carried in the 
file path. For example, if the file path is 
/path/to/city=beijing/date="2023-07-09", you should fill in 
`path_partition_keys="city,date"`. It will automatically read the corresponding 
column names and values from the path during load process.
-- `resource`：（optional）Specify the resource name. Hdfs Tvf can use the 
existing Hdfs resource to directly access Hdfs. You can refer to the method for 
creating an Hdfs resource: 
[CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md).
 This property is supported starting from version 2.1.4. 
-
-:::tip Tip
-To directly query a TVF or create a VIEW based on that TVF, you need to have 
usage permission for that resource. To query a VIEW created based on TVF, you 
only need select permission for that VIEW.
-:::
-
-### Examples
-
-Read and access csv format files on hdfs storage.
-
-```sql
-MySQL [(none)]> select * from hdfs(
-            "uri" = 
"hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv");
-+------+---------+------+
-| c1   | c2      | c3   |
-+------+---------+------+
-| 1    | alice   | 18   |
-| 2    | bob     | 20   |
-| 3    | jack    | 24   |
-| 4    | jackson | 19   |
-| 5    | liming  | 18   |
-+------+---------+------+
-```
-
-Read and access csv format files on hdfs storage in HA mode.
-
-```sql
-MySQL [(none)]> select * from hdfs(
-            "uri" = 
"hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv",
-            "dfs.nameservices" = "my_hdfs",
-            "dfs.ha.namenodes.my_hdfs" = "nn1,nn2",
-            "dfs.namenode.rpc-address.my_hdfs.nn1" = "nanmenode01:8020",
-            "dfs.namenode.rpc-address.my_hdfs.nn2" = "nanmenode02:8020",
-            "dfs.client.failover.proxy.provider.my_hdfs" = 
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
-+------+---------+------+
-| c1   | c2      | c3   |
-+------+---------+------+
-| 1    | alice   | 18   |
-| 2    | bob     | 20   |
-| 3    | jack    | 24   |
-| 4    | jackson | 19   |
-| 5    | liming  | 18   |
-+------+---------+------+
-```
-
-Can be used with `desc function` :
-
-```sql
-MySQL [(none)]> desc function hdfs(
-            "uri" = 
"hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv_with_names");
-```
-
-### Keywords
-
-    hdfs, table-valued-function, tvf
-
-### Best Practice
-
-  For more detailed usage of HDFS tvf, please refer to [S3](./s3.md) tvf, The 
only difference between them is the way of accessing the storage system.
+## Required Parameters
+| Parameter              | Description                                         
                                                                   |
+|------------------------|------------------------------------------------------------------------------------------------------------------------|
+| `uri`                  | The URI for accessing HDFS. If the URI path does 
not exist or the file is empty, the HDFS TVF will return an empty set. |
+| `fs.defaultFS`         | The default file system URI for HDFS                
                                                                    |
+| `hadoop.username`      | Required, can be any string but cannot be empty.    
                                                                    |
+| `format`               | File format, required. Currently supports 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`.           |
+
+## Optional Parameters
+| Parameter                                   | Description                    
                                                                                
                              | Remarks                                         
                                    |
+|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
+| `hadoop.security.authentication`            | HDFS security authentication 
type                                                                            
                                |                                               
                                      |
+| `hadoop.username`                           | Alternative HDFS username      
                                                                                
                              |                                                 
                                    |
+| `hadoop.kerberos.principal`                 | Kerberos principal             
                                                                                
                              |                                                 
                                    |
+| `hadoop.kerberos.keytab`                    | Kerberos keytab                
                                                                                
                                 |                                              
                                       |
+| `dfs.client.read.shortcircuit`              | Enable short-circuit read      
                                                                                
                              |                                                 
                                    |
+| `dfs.domain.socket.path`                   | Domain socket path              
                                                                                
                             |                                                  
                                   |
+| `dfs.nameservices`                          | The nameservice for HA mode    
                                                                                
                              |                                                 
                                    |
+| `dfs.ha.namenodes.your-nameservices`        | Configuration for namenode in 
HA mode                                                                         
                               |                                                
                                     |
+| `dfs.namenode.rpc-address.your-nameservices.your-namenode` | Specify the RPC 
address for the namenode                                                        
                                             |                                  
                                                   |
+| `dfs.client.failover.proxy.provider.your-nameservices` | Specify the proxy 
provider for failover                                                           
                                           |                                    
                                                 |
+| `column_separator`                          | Column separator, default is 
`\t`                                                                            
                                |                                               
                                      |
+| `line_delimiter`                            | Line separator, default is 
`\n`                                                                            
                                  |                                             
                                        |
+| `compress_type`                             | Supported types: 
`UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default is `UNKNOWN`, 
and the type will be automatically inferred based on the URI suffix. |          
                                                                           |
+| `read_json_by_line`                         | For JSON format imports, 
default is `true`                                                               
                                    | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `strip_outer_array`                         | For JSON format imports, 
default is `false`                                                              
                                    | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `json_root`                                 | For JSON format imports, 
default is empty                                                                
                                   | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `json_paths`                                | For JSON format imports, 
default is empty                                                                
                                   | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `num_as_string`                             | For JSON format imports, 
default is `false`                                                              
                                   | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `fuzzy_parse`                               | For JSON format imports, 
default is `false`                                                              
                                   | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `trim_double_quotes`                        | For CSV format imports, 
boolean type. Default is `false`. If `true`, removes the outermost double 
quotes from each field.                  |                                      
                                               |
+| `skip_lines`                                | For CSV format imports, 
integer type. Default is 0. Skips the first few lines of the CSV file. This 
parameter is ignored if `csv_with_names` or `csv_with_names_and_types` is set. 
|                                                                               
      |
+| `path_partition_keys`                       | Specify the partition column 
names carried in the file path, for example 
`/path/to/city=beijing/date="2023-07-09"`, then fill in 
`path_partition_keys="city,date"`, which will automatically read the 
corresponding column names and values from the path for import. |               
                                                                      |
+| `resource`                                  | Specify the resource name. 
HDFS TVF can directly access HDFS using an existing HDFS resource. Refer to 
[CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md)
 for creating HDFS resources. | Supported from version 2.1.4 and above.         
                                   |
+
+## Usage Notes
+- To directly query a TVF or create a View based on the TVF, you need to have 
the **USAGE** privilege on the respective Resource. When querying a View 
created based on the TVF, only the **SELECT** privilege on the View is required.

Review Comment:
   权限部分，增加一个章节叫做
   - 权限控制（Access Control Requirements）



##########
docs/sql-manual/sql-functions/table-valued-functions/hdfs.md:
##########
@@ -24,139 +24,117 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## HDFS
-
-### Name
-
-hdfs
-
-### Description
+## Description
 
 HDFS table-valued-function(tvf), allows users to read and access file contents 
on S3-compatible object storage, just like accessing relational table. 
Currently supports 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format.
 
-#### syntax
+## Syntax
 
 ```sql
-hdfs(
-  "uri" = "..",
-  "fs.defaultFS" = "...",
-  "hadoop.username" = "...",
-  "format" = "csv",
-  "keyn" = "valuen" 
-  ...
+HDFS(
+    "uri" = "<uri>",
+    "fs.defaultFS" = "<fs_defaultFS>",
+    "hadoop.username" = "<hadoop_username>",
+    "format" = "<format>",
+    [, "<optional_property_key>" = "<optional_property_value>" [, ...] ]
   );
 ```
 
-**parameter description**
-
-Related parameters for accessing hdfs:
-
-- `uri`: (required) hdfs uri. If the uri path does not exist or the files are 
empty files, hdfs tvf will return an empty result set.
-- `fs.defaultFS`: (required)
-- `hadoop.username`: (required) Can be any string, but cannot be empty.
-- `hadoop.security.authentication`: (optional)
-- `hadoop.username`: (optional)
-- `hadoop.kerberos.principal`: (optional)
-- `hadoop.kerberos.keytab`: (optional)
-- `dfs.client.read.shortcircuit`: (optional)
-- `dfs.domain.socket.path`: (optional)
-
-Related parameters for accessing HDFS in HA mode:
-
-- `dfs.nameservices`: (optional)
-- `dfs.ha.namenodes.your-nameservices`: (optional)
-- `dfs.namenode.rpc-address.your-nameservices.your-namenode`: (optional)
-- `dfs.client.failover.proxy.provider.your-nameservices`: (optional)
-
-File format parameters:
-
-- `format`: (required) Currently support 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`
-- `column_separator`: (optional) default `\t`.
-- `line_delimiter`: (optional) default `\n`.
-- `compress_type`: (optional) Currently support 
`UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default value is 
`UNKNOWN`, it will automatically infer the type based on the suffix of `uri`.
-
-    The following 6 parameters are used for loading in json format. For 
specific usage methods, please refer to: [Json 
Load](../../../data-operate/import/import-way/load-json-format.md)
-
-- `read_json_by_line`: (optional) default `"true"`
-- `strip_outer_array`: (optional) default `"false"`
-- `json_root`: (optional) default `""`
-- `json_paths`: (optional) default `""`
-- `num_as_string`: (optional) default `false`
-- `fuzzy_parse`: (optional) default `false`
-
-  The following 2 parameters are used for loading in csv format:
-
-- `trim_double_quotes`: Boolean type (optional), the default value is `false`. 
True means that the outermost double quotes of each field in the csv file are 
trimmed.
-- `skip_lines`: Integer type (optional), the default value is 0. It will skip 
some lines in the head of csv file. It will be disabled when the format is 
`csv_with_names` or `csv_with_names_and_types`.
-
-other kinds of parameters:
-
-- `path_partition_keys`: (optional) Specifies the column names carried in the 
file path. For example, if the file path is 
/path/to/city=beijing/date="2023-07-09", you should fill in 
`path_partition_keys="city,date"`. It will automatically read the corresponding 
column names and values from the path during load process.
-- `resource`：（optional）Specify the resource name. Hdfs Tvf can use the 
existing Hdfs resource to directly access Hdfs. You can refer to the method for 
creating an Hdfs resource: 
[CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md).
 This property is supported starting from version 2.1.4. 
-
-:::tip Tip
-To directly query a TVF or create a VIEW based on that TVF, you need to have 
usage permission for that resource. To query a VIEW created based on TVF, you 
only need select permission for that VIEW.
-:::
-
-### Examples
-
-Read and access csv format files on hdfs storage.
-
-```sql
-MySQL [(none)]> select * from hdfs(
-            "uri" = 
"hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv");
-+------+---------+------+
-| c1   | c2      | c3   |
-+------+---------+------+
-| 1    | alice   | 18   |
-| 2    | bob     | 20   |
-| 3    | jack    | 24   |
-| 4    | jackson | 19   |
-| 5    | liming  | 18   |
-+------+---------+------+
-```
-
-Read and access csv format files on hdfs storage in HA mode.
-
-```sql
-MySQL [(none)]> select * from hdfs(
-            "uri" = 
"hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv",
-            "dfs.nameservices" = "my_hdfs",
-            "dfs.ha.namenodes.my_hdfs" = "nn1,nn2",
-            "dfs.namenode.rpc-address.my_hdfs.nn1" = "nanmenode01:8020",
-            "dfs.namenode.rpc-address.my_hdfs.nn2" = "nanmenode02:8020",
-            "dfs.client.failover.proxy.provider.my_hdfs" = 
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
-+------+---------+------+
-| c1   | c2      | c3   |
-+------+---------+------+
-| 1    | alice   | 18   |
-| 2    | bob     | 20   |
-| 3    | jack    | 24   |
-| 4    | jackson | 19   |
-| 5    | liming  | 18   |
-+------+---------+------+
-```
-
-Can be used with `desc function` :
-
-```sql
-MySQL [(none)]> desc function hdfs(
-            "uri" = 
"hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv_with_names");
-```
-
-### Keywords
-
-    hdfs, table-valued-function, tvf
-
-### Best Practice
-
-  For more detailed usage of HDFS tvf, please refer to [S3](./s3.md) tvf, The 
only difference between them is the way of accessing the storage system.
+## Required Parameters

Review Comment:
   先列个表格，说明 <optional_property_key> 和 <optional_property_value> 分别是什么。
   在描述 <optional_property_key> 的时候可以写上，可选的key参见下表



##########
docs/sql-manual/sql-functions/table-valued-functions/hdfs.md:
##########
@@ -24,139 +24,117 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## HDFS
-
-### Name
-
-hdfs
-
-### Description
+## Description
 
 HDFS table-valued-function(tvf), allows users to read and access file contents 
on S3-compatible object storage, just like accessing relational table. 
Currently supports 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format.
 
-#### syntax
+## Syntax
 
 ```sql
-hdfs(
-  "uri" = "..",
-  "fs.defaultFS" = "...",
-  "hadoop.username" = "...",
-  "format" = "csv",
-  "keyn" = "valuen" 
-  ...
+HDFS(
+    "uri" = "<uri>",
+    "fs.defaultFS" = "<fs_defaultFS>",
+    "hadoop.username" = "<hadoop_username>",
+    "format" = "<format>",
+    [, "<optional_property_key>" = "<optional_property_value>" [, ...] ]
   );
 ```
 
-**parameter description**
-
-Related parameters for accessing hdfs:
-
-- `uri`: (required) hdfs uri. If the uri path does not exist or the files are 
empty files, hdfs tvf will return an empty result set.
-- `fs.defaultFS`: (required)
-- `hadoop.username`: (required) Can be any string, but cannot be empty.
-- `hadoop.security.authentication`: (optional)
-- `hadoop.username`: (optional)
-- `hadoop.kerberos.principal`: (optional)
-- `hadoop.kerberos.keytab`: (optional)
-- `dfs.client.read.shortcircuit`: (optional)
-- `dfs.domain.socket.path`: (optional)
-
-Related parameters for accessing HDFS in HA mode:
-
-- `dfs.nameservices`: (optional)
-- `dfs.ha.namenodes.your-nameservices`: (optional)
-- `dfs.namenode.rpc-address.your-nameservices.your-namenode`: (optional)
-- `dfs.client.failover.proxy.provider.your-nameservices`: (optional)
-
-File format parameters:
-
-- `format`: (required) Currently support 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`
-- `column_separator`: (optional) default `\t`.
-- `line_delimiter`: (optional) default `\n`.
-- `compress_type`: (optional) Currently support 
`UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default value is 
`UNKNOWN`, it will automatically infer the type based on the suffix of `uri`.
-
-    The following 6 parameters are used for loading in json format. For 
specific usage methods, please refer to: [Json 
Load](../../../data-operate/import/import-way/load-json-format.md)
-
-- `read_json_by_line`: (optional) default `"true"`
-- `strip_outer_array`: (optional) default `"false"`
-- `json_root`: (optional) default `""`
-- `json_paths`: (optional) default `""`
-- `num_as_string`: (optional) default `false`
-- `fuzzy_parse`: (optional) default `false`
-
-  The following 2 parameters are used for loading in csv format:
-
-- `trim_double_quotes`: Boolean type (optional), the default value is `false`. 
True means that the outermost double quotes of each field in the csv file are 
trimmed.
-- `skip_lines`: Integer type (optional), the default value is 0. It will skip 
some lines in the head of csv file. It will be disabled when the format is 
`csv_with_names` or `csv_with_names_and_types`.
-
-other kinds of parameters:
-
-- `path_partition_keys`: (optional) Specifies the column names carried in the 
file path. For example, if the file path is 
/path/to/city=beijing/date="2023-07-09", you should fill in 
`path_partition_keys="city,date"`. It will automatically read the corresponding 
column names and values from the path during load process.
-- `resource`：（optional）Specify the resource name. Hdfs Tvf can use the 
existing Hdfs resource to directly access Hdfs. You can refer to the method for 
creating an Hdfs resource: 
[CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md).
 This property is supported starting from version 2.1.4. 
-
-:::tip Tip
-To directly query a TVF or create a VIEW based on that TVF, you need to have 
usage permission for that resource. To query a VIEW created based on TVF, you 
only need select permission for that VIEW.
-:::
-
-### Examples
-
-Read and access csv format files on hdfs storage.
-
-```sql
-MySQL [(none)]> select * from hdfs(
-            "uri" = 
"hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv");
-+------+---------+------+
-| c1   | c2      | c3   |
-+------+---------+------+
-| 1    | alice   | 18   |
-| 2    | bob     | 20   |
-| 3    | jack    | 24   |
-| 4    | jackson | 19   |
-| 5    | liming  | 18   |
-+------+---------+------+
-```
-
-Read and access csv format files on hdfs storage in HA mode.
-
-```sql
-MySQL [(none)]> select * from hdfs(
-            "uri" = 
"hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv",
-            "dfs.nameservices" = "my_hdfs",
-            "dfs.ha.namenodes.my_hdfs" = "nn1,nn2",
-            "dfs.namenode.rpc-address.my_hdfs.nn1" = "nanmenode01:8020",
-            "dfs.namenode.rpc-address.my_hdfs.nn2" = "nanmenode02:8020",
-            "dfs.client.failover.proxy.provider.my_hdfs" = 
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
-+------+---------+------+
-| c1   | c2      | c3   |
-+------+---------+------+
-| 1    | alice   | 18   |
-| 2    | bob     | 20   |
-| 3    | jack    | 24   |
-| 4    | jackson | 19   |
-| 5    | liming  | 18   |
-+------+---------+------+
-```
-
-Can be used with `desc function` :
-
-```sql
-MySQL [(none)]> desc function hdfs(
-            "uri" = 
"hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv_with_names");
-```
-
-### Keywords
-
-    hdfs, table-valued-function, tvf
-
-### Best Practice
-
-  For more detailed usage of HDFS tvf, please refer to [S3](./s3.md) tvf, The 
only difference between them is the way of accessing the storage system.
+## Required Parameters
+| Parameter              | Description                                         
                                                                   |
+|------------------------|------------------------------------------------------------------------------------------------------------------------|
+| `uri`                  | The URI for accessing HDFS. If the URI path does 
not exist or the file is empty, the HDFS TVF will return an empty set. |
+| `fs.defaultFS`         | The default file system URI for HDFS                
                                                                    |
+| `hadoop.username`      | Required, can be any string but cannot be empty.    
                                                                    |
+| `format`               | File format, required. Currently supports 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`.           |
+
+## Optional Parameters
+| Parameter                                   | Description                    
                                                                                
                              | Remarks                                         
                                    |
+|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
+| `hadoop.security.authentication`            | HDFS security authentication 
type                                                                            
                                |                                               
                                      |
+| `hadoop.username`                           | Alternative HDFS username      
                                                                                
                              |                                                 
                                    |
+| `hadoop.kerberos.principal`                 | Kerberos principal             
                                                                                
                              |                                                 
                                    |
+| `hadoop.kerberos.keytab`                    | Kerberos keytab                
                                                                                
                                 |                                              
                                       |
+| `dfs.client.read.shortcircuit`              | Enable short-circuit read      
                                                                                
                              |                                                 
                                    |
+| `dfs.domain.socket.path`                   | Domain socket path              
                                                                                
                             |                                                  
                                   |
+| `dfs.nameservices`                          | The nameservice for HA mode    
                                                                                
                              |                                                 
                                    |
+| `dfs.ha.namenodes.your-nameservices`        | Configuration for namenode in 
HA mode                                                                         
                               |                                                
                                     |
+| `dfs.namenode.rpc-address.your-nameservices.your-namenode` | Specify the RPC 
address for the namenode                                                        
                                             |                                  
                                                   |
+| `dfs.client.failover.proxy.provider.your-nameservices` | Specify the proxy 
provider for failover                                                           
                                           |                                    
                                                 |
+| `column_separator`                          | Column separator, default is 
`\t`                                                                            
                                |                                               
                                      |
+| `line_delimiter`                            | Line separator, default is 
`\n`                                                                            
                                  |                                             
                                        |
+| `compress_type`                             | Supported types: 
`UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default is `UNKNOWN`, 
and the type will be automatically inferred based on the URI suffix. |          
                                                                           |
+| `read_json_by_line`                         | For JSON format imports, 
default is `true`                                                               
                                    | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `strip_outer_array`                         | For JSON format imports, 
default is `false`                                                              
                                    | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `json_root`                                 | For JSON format imports, 
default is empty                                                                
                                   | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `json_paths`                                | For JSON format imports, 
default is empty                                                                
                                   | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `num_as_string`                             | For JSON format imports, 
default is `false`                                                              
                                   | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `fuzzy_parse`                               | For JSON format imports, 
default is `false`                                                              
                                   | Reference: [JSON 
Load](../../../data-operate/import/import-way/load-json-format.md) |
+| `trim_double_quotes`                        | For CSV format imports, 
boolean type. Default is `false`. If `true`, removes the outermost double 
quotes from each field.                  |                                      
                                               |
+| `skip_lines`                                | For CSV format imports, 
integer type. Default is 0. Skips the first few lines of the CSV file. This 
parameter is ignored if `csv_with_names` or `csv_with_names_and_types` is set. 
|                                                                               
      |
+| `path_partition_keys`                       | Specify the partition column 
names carried in the file path, for example 
`/path/to/city=beijing/date="2023-07-09"`, then fill in 
`path_partition_keys="city,date"`, which will automatically read the 
corresponding column names and values from the path for import. |               
                                                                      |
+| `resource`                                  | Specify the resource name. 
HDFS TVF can directly access HDFS using an existing HDFS resource. Refer to 
[CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md)
 for creating HDFS resources. | Supported from version 2.1.4 and above.         
                                   |
+
+## Usage Notes
+- To directly query a TVF or create a View based on the TVF, you need to have 
the **USAGE** privilege on the respective Resource. When querying a View 
created based on the TVF, only the **SELECT** privilege on the View is required.
+
+- For more detailed usage of HDFS TVF, please refer to the [S3](./s3.md) TVF, 
the only difference being the method of accessing the storage system.

Review Comment:
   这个参见S3的注意事项删了吧



##########
docs/sql-manual/sql-functions/table-valued-functions/mv_infos.md:
##########
@@ -61,42 +65,41 @@ mysql> desc function mv_infos("database"="tpch100");
 | MvPartitionInfo    | TEXT    | No   | false | NULL    | NONE  |
 | SyncWithBaseTables | BOOLEAN | No   | false | NULL    | NONE  |
 +--------------------+---------+------+-------+---------+-------+
-12 rows in set (0.01 sec)
 ```
 
-* Id: Materialized View ID
-* Name: Materialized View Name
-* JobName: The job name corresponding to the materialized view
-* State: Materialized View State
-* SchemaChangeDetail: The reason why the materialized view State becomes a 
SchemeChange
-* RefreshState: Materialized view refresh status
-* RefreshInfo: Refreshing strategy information defined by materialized views
-* QuerySql: Query statements defined by materialized views
-* EnvInfo: Environmental information during the creation of materialized views
-* MvProperties: Materialized visual attributes
-* MvPartitionInfo: Partition information of materialized views
-* SyncWithBaseTables：Is it synchronized with the base table data? To see which 
partition is not synchronized, please use [SHOW 
PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md)
-
-### example
-
-1. View all materialized views under db1
-
-```sql
-mysql> select * from mv_infos("database"="db1");
-```
-
-2. View the materialized view named mv1 under db1
-
-```sql
-mysql> select * from mv_infos("database"="db1") where Name = "mv1";
-```
-
-3. View the status of the materialized view named mv1 under db1
-
-```sql
-mysql> select State from mv_infos("database"="db1") where Name = "mv1";
-```
-
-### keywords
-
-    mv, infos
+The meaning of the fields is as follows:
+
+| Field                  | Type    | Description                               
                          |
+|------------------------|---------|---------------------------------------------------------------------|
+| Id                     | BIGINT  | Materialized view ID                      
                          |
+| Name                   | TEXT    | Materialized view name                    
                          |
+| JobName                | TEXT    | Job name corresponding to the 
materialized view                      |
+| State                  | TEXT    | State of the materialized view            
                           |
+| SchemaChangeDetail     | TEXT    | Reason for the state change to 
SchemaChange                         |
+| RefreshState           | TEXT    | Refresh state of the materialized view    
                           |
+| RefreshInfo            | TEXT    | Refresh strategy information defined for 
the materialized view       |
+| QuerySql               | TEXT    | SQL query defined for the materialized 
view                          |
+| EnvInfo                | TEXT    | Environment information when the 
materialized view was created       |
+| MvProperties           | TEXT    | Materialized view properties              
                           |
+| MvPartitionInfo        | TEXT    | Partition information of the materialized 
view                       |
+| SyncWithBaseTables     | BOOLEAN | Whether the data is synchronized with the 
base table. To check which partition is not synchronized, use [SHOW 
PARTITIONS](../sql-reference/Show-Statements/SHOW-PARTITIONS.md) |
+
+## Examples
+
+- View all materialized views under db1
+    
+    ```sql
+    mysql> select * from mv_infos("database"="db1");

Review Comment:
   example都要有对应的执行结果示例



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Re: [PR] improve the article format of hdfs() and mv_infos() function [doris-website]

Reply via email to