This is an automated email from the ASF dual-hosted git repository. jiafengzheng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new cac55feff8a doc fix cac55feff8a is described below commit cac55feff8a99c5ff046869952f51de6722dd61c Author: jiafeng.zhang <zhang...@gmail.com> AuthorDate: Thu Aug 11 09:53:26 2022 +0800 doc fix --- .../import/import-way/insert-into-manual.md | 7 +++- .../import/import-way/spark-load-manual.md | 46 +++++++++++++++++++--- docs/ecosystem/external-table/doris-on-es.md | 4 +- .../import/import-way/insert-into-manual.md | 7 +++- .../import/import-way/spark-load-manual.md | 42 +++++++++++++++++++- .../ecosystem/external-table/doris-on-es.md | 4 +- 6 files changed, 98 insertions(+), 12 deletions(-) diff --git a/docs/data-operate/import/import-way/insert-into-manual.md b/docs/data-operate/import/import-way/insert-into-manual.md index dceafb10c29..bd00cbb55cc 100644 --- a/docs/data-operate/import/import-way/insert-into-manual.md +++ b/docs/data-operate/import/import-way/insert-into-manual.md @@ -46,7 +46,7 @@ INSERT INTO tbl2 WITH LABEL label1 SELECT * FROM tbl3; INSERT INTO tbl1 VALUES ("qweasdzxcqweasdzxc"), ("a"); ``` -> Note: When you need to use `CTE(Common Table Expressions)` as the query part in an insert operation, you must specify the `WITH LABEL` and column list parts. Example: +> Note: When you need to use `CTE(Common Table Expressions)` as the query part in an insert operation, you must specify the `WITH LABEL` and column list parts or wrap `CTE`. Example: > > ```sql > INSERT INTO tbl1 WITH LABEL label1 @@ -57,6 +57,11 @@ INSERT INTO tbl1 VALUES ("qweasdzxcqweasdzxc"), ("a"); > INSERT INTO tbl1 (k1) > WITH cte1 AS (SELECT * FROM tbl1), cte2 AS (SELECT * FROM tbl2) > SELECT k1 FROM cte1 JOIN cte2 WHERE cte1.k1 = 1; +> +> INSERT INTO tbl1 (k1) +> select * from ( +> WITH cte1 AS (SELECT * FROM tbl1), cte2 AS (SELECT * FROM tbl2) +> SELECT k1 FROM cte1 JOIN cte2 WHERE cte1.k1 = 1) as ret > ``` For specific parameter description, you can refer to [INSERT INTO](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/INSERT.md) command or execute `HELP INSERT ` to see its help documentation for better use of this import method. diff --git a/docs/data-operate/import/import-way/spark-load-manual.md b/docs/data-operate/import/import-way/spark-load-manual.md index d801d3af546..a1c314ec377 100644 --- a/docs/data-operate/import/import-way/spark-load-manual.md +++ b/docs/data-operate/import/import-way/spark-load-manual.md @@ -153,7 +153,11 @@ PROPERTIES spark_conf_key = spark_conf_value, working_dir = path, broker = broker_name, - broker.property_key = property_value + broker.property_key = property_value, + hadoop.security.authentication = kerberos, + kerberos_principal = do...@your.com, + kerberos_keytab = /home/doris/my.keytab + kerberos_keytab_content = ASDOWHDLAWIDJHWLDKSALDJSDIWALD ) -- drop spark resource @@ -178,7 +182,6 @@ REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name `Properties` are the parameters related to spark resources, as follows: - `type`: resource type, required. Currently, only spark is supported. - - Spark related parameters are as follows: - `spark.master`: required, yarn is supported at present, `spark://host:port`. @@ -190,11 +193,12 @@ REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name - `spark.hadoop.fs.defaultfs`: required when master is yarn. - Other parameters are optional, refer to `http://spark.apache.org/docs/latest/configuration.html` - - `working_dir`: directory used by ETL. Spark is required when used as an ETL resource. For example: `hdfs://host :port/tmp/doris`. - +- `hadoop.security.authentication`: Specify the authentication method as kerberos. +- `kerberos_principal`: Specify the principal of kerberos. +- `kerberos_keytab`: Specify the path to the keytab file for kerberos. The file must be an absolute path to a file on the server where the broker process is located. And can be accessed by the Broker process. +- `kerberos_keytab_content`: Specify the content of the keytab file in kerberos after base64 encoding. You can choose one of these with `kerberos_keytab` configuration. - `broker`: the name of the broker. Spark is required when used as an ETL resource. You need to use the 'alter system add broker' command to complete the configuration in advance. - - `broker.property_key`: the authentication information that the broker needs to specify when reading the intermediate file generated by ETL. Example: @@ -231,6 +235,38 @@ PROPERTIES ); ``` +**Spark Load supports Kerberos authentication** + +If Spark load accesses Hadoop cluster resources with Kerberos authentication, we only need to specify the following parameters when creating Spark resources: + +- `hadoop.security.authentication`: Specify the authentication method as kerberos. +- `kerberos_principal`: Specify the principal of kerberos. +- `kerberos_keytab`: Specify the path to the keytab file for kerberos. The file must be an absolute path to a file on the server where the broker process is located. And can be accessed by the Broker process. +- `kerberos_keytab_content`: Specify the content of the keytab file in kerberos after base64 encoding. You can choose one of these with `kerberos_keytab` configuration. + +Example: + +```sql +CREATE EXTERNAL RESOURCE "spark_on_kerberos" +PROPERTIES +( + "type" = "spark", + "spark.master" = "yarn", + "spark.submit.deployMode" = "cluster", + "spark.jars" = "xxx.jar,yyy.jar", + "spark.files" = "/tmp/aaa,/tmp/bbb", + "spark.executor.memory" = "1g", + "spark.yarn.queue" = "queue0", + "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999", + "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000", + "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris", + "broker" = "broker0", + "hadoop.security.authentication" = "kerberos", + "kerberos_principal" = "do...@your.com", + "kerberos_keytab" = "/home/doris/my.keytab" +); +``` + **Show resources** Ordinary accounts can only see the resources that they have `USAGE_PRIV` to use. diff --git a/docs/ecosystem/external-table/doris-on-es.md b/docs/ecosystem/external-table/doris-on-es.md index c9854373f8a..d208a1efc02 100644 --- a/docs/ecosystem/external-table/doris-on-es.md +++ b/docs/ecosystem/external-table/doris-on-es.md @@ -325,7 +325,7 @@ This term does not match any term in the dictionary, and will not return any res The type of `k4.keyword` is `keyword`, and writing data into ES is a complete term, so it can be matched -### Enable node discovery mechanism, default is true(es\_nodes\_discovery=true) +### Enable node discovery mechanism, default is true(nodes\_discovery=true) ``` CREATE EXTERNAL TABLE `test` ( @@ -348,7 +348,7 @@ Parameter Description: Parameter | Description ---|--- -**es\_nodes\_discovery** | Whether or not to enable ES node discovery. the default is true +**nodes\_discovery** | Whether or not to enable ES node discovery. the default is true Doris would find all available related data nodes (shards allocated on)from ES when this is true. Just set false if address of ES data nodes are not accessed by Doris BE, eg. the ES cluster is deployed in the intranet which isolated from your public Internet, and users access through a proxy diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/insert-into-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/insert-into-manual.md index 5a420743278..d9811eda22c 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/insert-into-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/insert-into-manual.md @@ -46,7 +46,7 @@ INSERT INTO tbl2 WITH LABEL label1 SELECT * FROM tbl3; INSERT INTO tbl1 VALUES ("qweasdzxcqweasdzxc"), ("a"); ``` -> 注意:当需要使用 `CTE(Common Table Expressions)` 作为 insert 操作中的查询部分时,必须指定 `WITH LABEL` 和 column list 部分。示例: +> 注意:当需要使用 `CTE(Common Table Expressions)` 作为 insert 操作中的查询部分时,必须指定 `WITH LABEL` 和 column list 部分或者对`CTE`进行包装。示例: > > ```sql > INSERT INTO tbl1 WITH LABEL label1 @@ -57,6 +57,11 @@ INSERT INTO tbl1 VALUES ("qweasdzxcqweasdzxc"), ("a"); > INSERT INTO tbl1 (k1) > WITH cte1 AS (SELECT * FROM tbl1), cte2 AS (SELECT * FROM tbl2) > SELECT k1 FROM cte1 JOIN cte2 WHERE cte1.k1 = 1; +> +> INSERT INTO tbl1 (k1) +> select * from ( +> WITH cte1 AS (SELECT * FROM tbl1), cte2 AS (SELECT * FROM tbl2) +> SELECT k1 FROM cte1 JOIN cte2 WHERE cte1.k1 = 1) as ret > ``` 具体的参数说明,你可以参照 [INSERT INTO](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/INSERT.md) 命令或者执行`HELP INSERT` 来查看其帮助文档以便更好的使用这种导入方式。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/spark-load-manual.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/spark-load-manual.md index b6269ea176d..6933709791e 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/spark-load-manual.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/spark-load-manual.md @@ -126,7 +126,11 @@ PROPERTIES spark_conf_key = spark_conf_value, working_dir = path, broker = broker_name, - broker.property_key = property_value + broker.property_key = property_value, + hadoop.security.authentication = kerberos, + kerberos_principal = do...@your.com, + kerberos_keytab = /home/doris/my.keytab + kerberos_keytab_content = ASDOWHDLAWIDJHWLDKSALDJSDIWALD ) -- drop spark resource @@ -158,6 +162,10 @@ REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name - `spark.hadoop.fs.defaultFS`: master为yarn时必填。 - 其他参数为可选,参考http://spark.apache.org/docs/latest/configuration.html - `working_dir`: ETL 使用的目录。spark作为ETL资源使用时必填。例如:hdfs://host:port/tmp/doris。 +- `hadoop.security.authentication`:指定认证方式为 kerberos。 +- `kerberos_principal`:指定 kerberos 的 principal。 +- `kerberos_keytab`:指定 kerberos 的 keytab 文件路径。该文件必须为 Broker 进程所在服务器上的文件的绝对路径。并且可以被 Broker 进程访问。 +- `kerberos_keytab_content`:指定 kerberos 中 keytab 文件内容经过 base64 编码之后的内容。这个跟 `kerberos_keytab` 配置二选一即可。 - `broker`: broker 名字。spark 作为 ETL 资源使用时必填。需要使用 `ALTER SYSTEM ADD BROKER` 命令提前完成配置。 - `broker.property_key`: broker 读取 ETL 生成的中间文件时需要指定的认证信息等。 @@ -192,6 +200,38 @@ PROPERTIES "spark.submit.deployMode" = "client", "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris", "broker" = "broker1" +) +``` + +**Spark Load 支持 Kerberos 认证** + +如果是 Spark load 访问带有 Kerberos 认证的 Hadoop 集群资源,我们只需要在创建 Spark resource 的时候指定以下参数即可: + +- `hadoop.security.authentication`:指定认证方式为 kerberos。 +- `kerberos_principal`:指定 kerberos 的 principal。 +- `kerberos_keytab`:指定 kerberos 的 keytab 文件路径。该文件必须为 Broker 进程所在服务器上的文件的绝对路径。并且可以被 Broker 进程访问。 +- `kerberos_keytab_content`:指定 kerberos 中 keytab 文件内容经过 base64 编码之后的内容。这个跟 `kerberos_keytab` 配置二选一即可。 + +实例: + +```sql +CREATE EXTERNAL RESOURCE "spark_on_kerberos" +PROPERTIES +( + "type" = "spark", + "spark.master" = "yarn", + "spark.submit.deployMode" = "cluster", + "spark.jars" = "xxx.jar,yyy.jar", + "spark.files" = "/tmp/aaa,/tmp/bbb", + "spark.executor.memory" = "1g", + "spark.yarn.queue" = "queue0", + "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999", + "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000", + "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris", + "broker" = "broker0", + "hadoop.security.authentication" = "kerberos", + "kerberos_principal" = "do...@your.com", + "kerberos_keytab" = "/home/doris/my.keytab" ); ``` diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/external-table/doris-on-es.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/external-table/doris-on-es.md index 39c73f2621a..9dc3a729759 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/external-table/doris-on-es.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/external-table/doris-on-es.md @@ -322,7 +322,7 @@ POST /_analyze `k4.keyword` 的类型是`keyword`,数据写入ES中是一个完整的term,所以可以匹配 -### 开启节点自动发现, 默认为true(es\_nodes\_discovery=true) +### 开启节点自动发现, 默认为true(nodes\_discovery=true) ``` CREATE EXTERNAL TABLE `test` ( @@ -345,7 +345,7 @@ PROPERTIES ( 参数 | 说明 ---|--- -**es\_nodes\_discovery** | 是否开启es节点发现,默认为true +**nodes\_discovery** | 是否开启es节点发现,默认为true 当配置为true时,Doris将从ES找到所有可用的相关数据节点(在上面分配的分片)。如果ES数据节点的地址没有被Doris BE访问,则设置为false。ES集群部署在与公共Internet隔离的内网,用户通过代理访问 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org