This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push: new 46ce66cbd8 [docs](multi-catalog)update en docs (#16160) 46ce66cbd8 is described below commit 46ce66cbd86890aae82ef2eaa97798d67159d2b8 Author: Hu Yanjun <100749531+httpshir...@users.noreply.github.com> AuthorDate: Sun Jan 29 00:36:31 2023 +0800 [docs](multi-catalog)update en docs (#16160) --- docs/en/docs/lakehouse/multi-catalog/dlf.md | 78 ++++++++++- docs/en/docs/lakehouse/multi-catalog/hive.md | 147 ++++++++++++++++++++- docs/en/docs/lakehouse/multi-catalog/hudi.md | 26 +++- docs/en/docs/lakehouse/multi-catalog/iceberg.md | 49 ++++++- .../docs/lakehouse/multi-catalog/multi-catalog.md | 2 +- docs/zh-CN/docs/lakehouse/multi-catalog/hive.md | 16 +-- docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md | 2 +- 7 files changed, 304 insertions(+), 16 deletions(-) diff --git a/docs/en/docs/lakehouse/multi-catalog/dlf.md b/docs/en/docs/lakehouse/multi-catalog/dlf.md index 82bdd1f64d..d533ce943e 100644 --- a/docs/en/docs/lakehouse/multi-catalog/dlf.md +++ b/docs/en/docs/lakehouse/multi-catalog/dlf.md @@ -1,6 +1,6 @@ --- { - "title": "Aliyun DLF", + "title": "Alibaba Cloud DLF", "language": "en" } --- @@ -25,7 +25,79 @@ under the License. --> -# Aliyun DLF +# Alibaba Cloud DLF + +Data Lake Formation (DLF) is the unified metadata management service of Alibaba Cloud. It is compatible with the Hive Metastore protocol. + +> [What is DLF](https://www.alibabacloud.com/product/datalake-formation) + +Doris can access DLF the same way as it accesses Hive Metastore. + +## Connect to DLF + +1. Create `hive-site.xml` + + Create the `hive-site.xml` file, and put it in the `fe/conf` directory. + + ``` + <?xml version="1.0"?> + <configuration> + <!--Set to use dlf client--> + <property> + <name>hive.metastore.type</name> + <value>dlf</value> + </property> + <property> + <name>dlf.catalog.endpoint</name> + <value>dlf-vpc.cn-beijing.aliyuncs.com</value> + </property> + <property> + <name>dlf.catalog.region</name> + <value>cn-beijing</value> + </property> + <property> + <name>dlf.catalog.proxyMode</name> + <value>DLF_ONLY</value> + </property> + <property> + <name>dlf.catalog.uid</name> + <value>20000000000000000</value> + </property> + <property> + <name>dlf.catalog.accessKeyId</name> + <value>XXXXXXXXXXXXXXX</value> + </property> + <property> + <name>dlf.catalog.accessKeySecret</name> + <value>XXXXXXXXXXXXXXXXX</value> + </property> + </configuration> + ``` + + * `dlf.catalog.endpoint`: DLF Endpoint. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints). + * `dlf.catalog.region`: DLF Region. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints). + * `dlf.catalog.uid`: Alibaba Cloud account. You can find the "Account ID" in the upper right corner on the Alibaba Cloud console. + * `dlf.catalog.accessKeyId`:AccessKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak). + * `dlf.catalog.accessKeySecret`:SecretKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak). + + Other configuration items are fixed and require no modifications. + +2. Restart FE, and create Catalog via the `CREATE CATALOG` statement. + + Doris will read and parse `fe/conf/hive-site.xml`. + + ```sql + CREATE CATALOG hive_with_dlf PROPERTIES ( + "type"="hms", + "hive.metastore.uris" = "thrift://127.0.0.1:9083" + ) + ``` + + `type` should always be `hms`; while `hive.metastore.uris` can be arbitary since it is not used in real practice, but it should follow the format of Hive Metastore Thrift URI. + + After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore. + + Doris supports accessing Hive/Iceberg/Hudi metadata in DLF. + -TODO: translate diff --git a/docs/en/docs/lakehouse/multi-catalog/hive.md b/docs/en/docs/lakehouse/multi-catalog/hive.md index fd3bfd8191..18ae073160 100644 --- a/docs/en/docs/lakehouse/multi-catalog/hive.md +++ b/docs/en/docs/lakehouse/multi-catalog/hive.md @@ -26,4 +26,149 @@ under the License. # Hive -TODO: translate +Once Doris is connected to Hive Metastore or made compatible with Hive Metastore metadata service, it can access databases and tables in Hive and conduct queries. + +Besides Hive, many other systems, such as Iceberg and Hudi, use Hive Metastore to keep their metadata. Thus, Doris can also access these systems via Hive Catalog. + +## Usage + +When connnecting to Hive, Doris: + +1. Supports Hive version 1/2/3; +2. Supports both Managed Table and External Table; +3. Can identify metadata of Hive, Iceberg, and Hudi stored in Hive Metastore; +4. Supports Hive tables with data stored in JuiceFS, which can be used the same way as normal Hive tables (put `juicefs-hadoop-x.x.x.jar` in `fe/lib/` and `apache_hdfs_broker/lib/`). + +## Create Catalog + +```sql +CREATE CATALOG hive PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'hadoop.username' = 'hive', + 'dfs.nameservices'='your-nameservice', + 'dfs.ha.namenodes.your-nameservice'='nn1,nn2', + 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007', + 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007', + 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' +); +``` + + In addition to `type` and `hive.metastore.uris` , which are required, you can specify other parameters regarding the connection. + +For example, to specify HDFS HA: + +```sql +CREATE CATALOG hive PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'hadoop.username' = 'hive', + 'dfs.nameservices'='your-nameservice', + 'dfs.ha.namenodes.your-nameservice'='nn1,nn2', + 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007', + 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007', + 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' +); +``` + +To specify HDFS HA and Kerberos authentication information: + +```sql +CREATE CATALOG hive PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'hive.metastore.sasl.enabled' = 'true', + 'dfs.nameservices'='your-nameservice', + 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007', + 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007', + 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', + 'hadoop.security.authentication' = 'kerberos', + 'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab', + 'hadoop.kerberos.principal' = 'your-princi...@your.com', + 'yarn.resourcemanager.address' = 'your-rm-address:your-rm-port', + 'yarn.resourcemanager.principal' = 'your-rm-principal/_h...@your.com' +); +``` + +To provide Hadoop KMS encrypted transmission information: + +```sql +CREATE CATALOG hive PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'dfs.encryption.key.provider.uri' = 'kms://http@kms_host:kms_port/kms' +); +``` + +Or to connect to Hive data stored in JuiceFS: + +```sql +CREATE CATALOG hive PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'hadoop.username' = 'root', + 'fs.jfs.impl' = 'io.juicefs.JuiceFileSystem', + 'fs.AbstractFileSystem.jfs.impl' = 'io.juicefs.JuiceFS', + 'juicefs.meta' = 'xxx' +); +``` + +In Doris 1.2.1 and newer, you can create a Resource that contains all these parameters, and reuse the Resource when creating new Catalogs. Here is an example: + +```sql +# 1. Create Resource +CREATE RESOURCE hms_resource PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'hadoop.username' = 'hive', + 'dfs.nameservices'='your-nameservice', + 'dfs.ha.namenodes.your-nameservice'='nn1,nn2', + 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007', + 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007', + 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' +); + +# 2. Create Catalog and use an existing Resource. The key and value information in the followings will overwrite the corresponding information in the Resource. +CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES( + 'key' = 'value' +); +``` + +You can also put the `hive-site.xml` file in the `conf` directories of FE and BE. This will enable Doris to automatically read information from `hive-site.xml`. The relevant information will be overwritten based on the following rules : + + +* Information in Resource will overwrite that in `hive-site.xml`. +* Information in `CREATE CATALOG PROPERTIES` will overwrite that in Resource. + +### Hive Versions + +Doris can access Hive Metastore in all Hive versions. By default, Doris uses the interface compatible with Hive 2.3 to access Hive Metastore. You can specify a certain Hive version when creating Catalogs, for example: + +```sql +CREATE CATALOG hive PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'hive.version' = '1.1.0' +); +``` + +## Column Type Mapping + +This is applicable for Hive/Iceberge/Hudi. + +| HMS Type | Doris Type | Comment | +| ------------- | ------------- | ------------------------------------------------- | +| boolean | boolean | | +| tinyint | tinyint | | +| smallint | smallint | | +| int | int | | +| bigint | bigint | | +| date | date | | +| timestamp | datetime | | +| float | float | | +| double | double | | +| char | char | | +| varchar | varchar | | +| decimal | decimal | | +| `array<type>` | `array<type>` | Support nested array, such as `array<array<int>>` | +| other | unsupported | | diff --git a/docs/en/docs/lakehouse/multi-catalog/hudi.md b/docs/en/docs/lakehouse/multi-catalog/hudi.md index 79e351f994..21f093dcde 100644 --- a/docs/en/docs/lakehouse/multi-catalog/hudi.md +++ b/docs/en/docs/lakehouse/multi-catalog/hudi.md @@ -27,4 +27,28 @@ under the License. # Hudi -TODO: translate +## Usage + +1. Currently, Doris supports Snapshot Query on Copy-on-Write Hudi tables and Read Optimized Query on Merge-on-Read tables. In the future, it will support Snapshot Query on Merge-on-Read tables and Incremental Query. +2. Doris only supports Hive Metastore Catalogs currently. The usage is basically the same as that of Hive Catalogs. More types of Catalogs will be supported in future versions. + +## Create Catalog + +Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information. + +```sql +CREATE CATALOG hudi PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'hadoop.username' = 'hive', + 'dfs.nameservices'='your-nameservice', + 'dfs.ha.namenodes.your-nameservice'='nn1,nn2', + 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007', + 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007', + 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' +); +``` + +## Column Type Mapping + +Same as that in Hive Catalogs. See the relevant section in [Hive](./hive). diff --git a/docs/en/docs/lakehouse/multi-catalog/iceberg.md b/docs/en/docs/lakehouse/multi-catalog/iceberg.md index bff7672543..67ce750066 100644 --- a/docs/en/docs/lakehouse/multi-catalog/iceberg.md +++ b/docs/en/docs/lakehouse/multi-catalog/iceberg.md @@ -27,4 +27,51 @@ under the License. # Iceberg -TODO: translate +## Usage + +When connecting to Iceberg, Doris: + +1. Supports Iceberg V1/V2 table formats; +2. Supports Position Delete but not Equality Delete for V2 format; +3. Only supports Hive Metastore Catalogs. The usage is the same as that of Hive Catalogs. + +## Create Catalog + +Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information. + +```sql +CREATE CATALOG iceberg PROPERTIES ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', + 'hadoop.username' = 'hive', + 'dfs.nameservices'='your-nameservice', + 'dfs.ha.namenodes.your-nameservice'='nn1,nn2', + 'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007', + 'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007', + 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' +); +``` + +## Column Type Mapping + +Same as that in Hive Catalogs. See the relevant section in [Hive](./hive). + +## Time Travel + +<version since="dev"> + +Doris supports reading the specified Snapshot of Iceberg tables. + +</version> + +Each write operation to an Iceberg table will generate a new Snapshot. + +By default, a read request will only read the latest Snapshot. + +You can read data of historical table versions using the `FOR TIME AS OF` or `FOR VERSION AS OF` statements based on the Snapshot ID or the timepoint the Snapshot is generated. For example: + +`SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";` + +`SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;` + +You can use the [iceberg_meta](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-functions/iceberg_meta/) table function to view the Snapshot details of the specified table. diff --git a/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md b/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md index 5118a62509..61dc900978 100644 --- a/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md +++ b/docs/en/docs/lakehouse/multi-catalog/multi-catalog.md @@ -261,7 +261,7 @@ See [Hudi](./hudi) ### Connect to Elasticsearch -See [Elasticsearch](./elasticsearch) +See [Elasticsearch](./es) ### Connect to JDBC diff --git a/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md b/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md index 50fc541ada..aa9a7bc53d 100644 --- a/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md +++ b/docs/zh-CN/docs/lakehouse/multi-catalog/hive.md @@ -28,7 +28,7 @@ under the License. 通过连接 Hive Metastore,或者兼容 Hive Metatore 的元数据服务,Doris 可以自动获取 Hive 的库表信息,并进行数据查询。 -除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能方位 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。 +除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能访问 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。 ## 使用限制 @@ -38,7 +38,7 @@ under the License. 4. 支持数据存储在 Juicefs 上的 hive 表,用法如下(需要把juicefs-hadoop-x.x.x.jar放在 fe/lib/ 和 apache_hdfs_broker/lib/ 下)。 ## 创建 Catalog - + ```sql CREATE CATALOG hive PROPERTIES ( 'type'='hms', @@ -51,7 +51,7 @@ CREATE CATALOG hive PROPERTIES ( 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' ); ``` - + 除了 `type` 和 `hive.metastore.uris` 两个必须参数外,还可以通过更多参数来传递连接所需要的信息。 如提供 HDFS HA 信息,示例如下: @@ -68,7 +68,7 @@ CREATE CATALOG hive PROPERTIES ( 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' ); ``` - + 同时提供 HDFS HA 信息和 Kerberos 认证信息,示例如下: ```sql @@ -87,7 +87,7 @@ CREATE CATALOG hive PROPERTIES ( 'yarn.resourcemanager.principal' = 'your-rm-principal/_h...@your.com' ); ``` - + 提供 Hadoop KMS 加密传输信息,示例如下: ```sql @@ -110,7 +110,7 @@ CREATE CATALOG hive PROPERTIES ( 'juicefs.meta' = 'xxx' ); ``` - + 在 1.2.1 版本之后,我们也可以将这些信息通过创建一个 Resource 统一存储,然后在创建 Catalog 时使用这个 Resource。示例如下: ```sql @@ -126,12 +126,12 @@ CREATE RESOURCE hms_resource PROPERTIES ( 'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' ); -# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息回覆盖 Resource 中的信息。 +# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息会覆盖 Resource 中的信息。 CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES( 'key' = 'value' ); ``` - + 我们也可以直接将 hive-site.xml 放到 FE 和 BE 的 conf 目录下,系统也会自动读取 hive-site.xml 中的信息。信息覆盖的规则如下: * Resource 中的信息覆盖 hive-site.xml 中的信息。 diff --git a/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md b/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md index 0f958a813e..5de988a9b1 100644 --- a/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md +++ b/docs/zh-CN/docs/lakehouse/multi-catalog/hudi.md @@ -37,7 +37,7 @@ under the License. 和 Hive Catalog 基本一致,这里仅给出简单示例。其他示例可参阅 [Hive Catalog](./hive)。 ```sql -CREATE CATALOG iceberg PROPERTIES ( +CREATE CATALOG hudi PROPERTIES ( 'type'='hms', 'hive.metastore.uris' = 'thrift://172.21.0.1:7004', 'hadoop.username' = 'hive', --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org