Re: [PR] improve the article format of hdfs() and mv_infos() function [doris-website]

via GitHub Thu, 16 Jan 2025 20:07:33 -0800


DongLiang-0 commented on code in PR #1828:
URL: https://github.com/apache/doris-website/pull/1828#discussion_r1919524659



##########
docs/sql-manual/sql-functions/table-valued-functions/hdfs.md:
##########
@@ -24,139 +24,117 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-## HDFS
-
-### Name
-
-hdfs
-
-### Description
+## Description
 
 HDFS table-valued-function(tvf), allows users to read and access file contents 
on S3-compatible object storage, just like accessing relational table. 
Currently supports 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` file format.
 
-#### syntax
+## Syntax
 
 ```sql
-hdfs(
-  "uri" = "..",
-  "fs.defaultFS" = "...",
-  "hadoop.username" = "...",
-  "format" = "csv",
-  "keyn" = "valuen" 
-  ...
+HDFS(
+    "uri" = "<uri>",
+    "fs.defaultFS" = "<fs_defaultFS>",
+    "hadoop.username" = "<hadoop_username>",
+    "format" = "<format>",
+    [, "<optional_property_key>" = "<optional_property_value>" [, ...] ]
   );
 ```
 
-**parameter description**
-
-Related parameters for accessing hdfs:
-
-- `uri`: (required) hdfs uri. If the uri path does not exist or the files are 
empty files, hdfs tvf will return an empty result set.
-- `fs.defaultFS`: (required)
-- `hadoop.username`: (required) Can be any string, but cannot be empty.
-- `hadoop.security.authentication`: (optional)
-- `hadoop.username`: (optional)
-- `hadoop.kerberos.principal`: (optional)
-- `hadoop.kerberos.keytab`: (optional)
-- `dfs.client.read.shortcircuit`: (optional)
-- `dfs.domain.socket.path`: (optional)
-
-Related parameters for accessing HDFS in HA mode:
-
-- `dfs.nameservices`: (optional)
-- `dfs.ha.namenodes.your-nameservices`: (optional)
-- `dfs.namenode.rpc-address.your-nameservices.your-namenode`: (optional)
-- `dfs.client.failover.proxy.provider.your-nameservices`: (optional)
-
-File format parameters:
-
-- `format`: (required) Currently support 
`csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`
-- `column_separator`: (optional) default `\t`.
-- `line_delimiter`: (optional) default `\n`.
-- `compress_type`: (optional) Currently support 
`UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE/SNAPPYBLOCK`. Default value is 
`UNKNOWN`, it will automatically infer the type based on the suffix of `uri`.
-
-    The following 6 parameters are used for loading in json format. For 
specific usage methods, please refer to: [Json 
Load](../../../data-operate/import/import-way/load-json-format.md)
-
-- `read_json_by_line`: (optional) default `"true"`
-- `strip_outer_array`: (optional) default `"false"`
-- `json_root`: (optional) default `""`
-- `json_paths`: (optional) default `""`
-- `num_as_string`: (optional) default `false`
-- `fuzzy_parse`: (optional) default `false`
-
-  The following 2 parameters are used for loading in csv format:
-
-- `trim_double_quotes`: Boolean type (optional), the default value is `false`. 
True means that the outermost double quotes of each field in the csv file are 
trimmed.
-- `skip_lines`: Integer type (optional), the default value is 0. It will skip 
some lines in the head of csv file. It will be disabled when the format is 
`csv_with_names` or `csv_with_names_and_types`.
-
-other kinds of parameters:
-
-- `path_partition_keys`: (optional) Specifies the column names carried in the 
file path. For example, if the file path is 
/path/to/city=beijing/date="2023-07-09", you should fill in 
`path_partition_keys="city,date"`. It will automatically read the corresponding 
column names and values from the path during load process.
-- `resource`：（optional）Specify the resource name. Hdfs Tvf can use the 
existing Hdfs resource to directly access Hdfs. You can refer to the method for 
creating an Hdfs resource: 
[CREATE-RESOURCE](../../sql-statements/Data-Definition-Statements/Create/CREATE-RESOURCE.md).
 This property is supported starting from version 2.1.4. 
-
-:::tip Tip
-To directly query a TVF or create a VIEW based on that TVF, you need to have 
usage permission for that resource. To query a VIEW created based on TVF, you 
only need select permission for that VIEW.
-:::
-
-### Examples
-
-Read and access csv format files on hdfs storage.
-
-```sql
-MySQL [(none)]> select * from hdfs(
-            "uri" = 
"hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv");
-+------+---------+------+
-| c1   | c2      | c3   |
-+------+---------+------+
-| 1    | alice   | 18   |
-| 2    | bob     | 20   |
-| 3    | jack    | 24   |
-| 4    | jackson | 19   |
-| 5    | liming  | 18   |
-+------+---------+------+
-```
-
-Read and access csv format files on hdfs storage in HA mode.
-
-```sql
-MySQL [(none)]> select * from hdfs(
-            "uri" = 
"hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv",
-            "dfs.nameservices" = "my_hdfs",
-            "dfs.ha.namenodes.my_hdfs" = "nn1,nn2",
-            "dfs.namenode.rpc-address.my_hdfs.nn1" = "nanmenode01:8020",
-            "dfs.namenode.rpc-address.my_hdfs.nn2" = "nanmenode02:8020",
-            "dfs.client.failover.proxy.provider.my_hdfs" = 
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
-+------+---------+------+
-| c1   | c2      | c3   |
-+------+---------+------+
-| 1    | alice   | 18   |
-| 2    | bob     | 20   |
-| 3    | jack    | 24   |
-| 4    | jackson | 19   |
-| 5    | liming  | 18   |
-+------+---------+------+
-```
-
-Can be used with `desc function` :
-
-```sql
-MySQL [(none)]> desc function hdfs(
-            "uri" = 
"hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv",
-            "fs.defaultFS" = "hdfs://127.0.0.1:8424",
-            "hadoop.username" = "doris",
-            "format" = "csv_with_names");
-```
-
-### Keywords
-
-    hdfs, table-valued-function, tvf
-
-### Best Practice
-
-  For more detailed usage of HDFS tvf, please refer to [S3](./s3.md) tvf, The 
only difference between them is the way of accessing the storage system.
+## Required Parameters

Review Comment:
   在可选参数下面加了这样一段描述了
   ```
   ## 可选参数 (Optional Parameters)
   
   上述语法中的 `optional_property_key` 可以按需从以下列表中选取对应的参数，`optional_property_value` 
则为该参数的值
   
   ``



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Re: [PR] improve the article format of hdfs() and mv_infos() function [doris-website]

Reply via email to