This is an automated email from the ASF dual-hosted git repository. xxyu pushed a commit to branch document in repository https://gitbox.apache.org/repos/asf/kylin.git
The following commit(s) were added to refs/heads/document by this push: new 41e496e add document for KYLIN-4485 41e496e is described below commit 41e496e8aad7fb995bac14d95cb2e19f0a6e3e0c Author: Zhichao Zhang <441586...@qq.com> AuthorDate: Fri Jul 3 15:30:38 2020 +0800 add document for KYLIN-4485 1. fix according to comments 2. add CN version --- website/_docs/gettingstarted/quickstart.cn.md | 26 ++-- website/_docs/gettingstarted/quickstart.md | 40 +++--- website/_docs/howto/howto_use_mr_hive_dict.md | 11 +- website/_docs/install/kylin_docker.cn.md | 2 +- website/_docs/install/kylin_docker.md | 2 +- website/_docs/tutorial/cube_migration.cn.md | 164 +++++++++++++++++++++++++ website/_docs/tutorial/cube_migration.md | 52 +++++--- website/images/docs/quickstart/pull_docker.png | Bin 311525 -> 75775 bytes 8 files changed, 235 insertions(+), 62 deletions(-) diff --git a/website/_docs/gettingstarted/quickstart.cn.md b/website/_docs/gettingstarted/quickstart.cn.md index 18274dd..518c398 100644 --- a/website/_docs/gettingstarted/quickstart.cn.md +++ b/website/_docs/gettingstarted/quickstart.cn.md @@ -14,25 +14,23 @@ since: v0.6.x ### 一、 从docker镜像安装使用kylin(不需要提前准备hadoop环境) 为了让用户方便的试用 Kylin,我们提供了 Kylin 的 docker 镜像。该镜像中,Kylin 依赖的各个服务均已正确的安装及部署,包括: -- jdk 1.8 +- JDK 1.8 - Hadoop 2.7.0 - Hive 1.2.1 -- Hbase 1.1.2 +- Hbase 1.1.2 (with Zookeeper) - Spark 2.3.1 -- Zookeeper 3.4.6 - Kafka 1.1.1 -- Mysql -- Maven 3.6.1 +- MySQL 5.1.73 我们已将面向用户的 Kylin 镜像上传至 docker 仓库,用户无需在本地构建镜像,只需要安装docker,就可以体验kylin的一键安装。 #### step1、首先执行以下命令从 docker 仓库 pull 镜像: ``` -docker pull apachekylin/apache-kylin-standalone:3.0.1 +docker pull apachekylin/apache-kylin-standalone:3.1.0 ``` -此处的镜像包含的是kylin最新Release版本kylin 3.0.1。由于该镜像中包含了所有kylin依赖的大数据组件,所以拉取镜像需要的时间较长,请耐心等待。Pull成功后显示如下: +此处的镜像包含的是kylin最新Release版本kylin 3.1.0。由于该镜像中包含了所有kylin依赖的大数据组件,所以拉取镜像需要的时间较长,请耐心等待。Pull成功后显示如下:  #### step2、执行以下命令来启动容器: @@ -46,7 +44,7 @@ docker run -d \ -p 8032:8032 \ -p 8042:8042 \ -p 16010:16010 \ -apachekylin/apache-kylin-standalone:3.0.1 +apachekylin/apache-kylin-standalone:3.1.0 ``` 容器会很快启动,由于容器内指定端口已经映射到本机端口,可以直接在本机浏览器中打开各个服务的页面,如: @@ -74,7 +72,7 @@ KAFKA_HOME=/home/admin/kafka_2.11-1.1.1 SPARK_HOME=/home/admin/spark-2.3.1-bin-hadoop2.6 HBASE_HOME=/home/admin/hbase-1.1.2 HIVE_HOME=/home/admin/apache-hive-1.2.1-bin -KYLIN_HOME=/home/admin/apache-kylin-3.0.0-alpha2-bin-hbase1x +KYLIN_HOME=/home/admin/apache-kylin-3.1.0-bin-hbase1x ``` 使用ADMIN/KYLIN的用户名和密码组合登陆Kylin后,用户可以使用sample cube来体验cube的构建和查询,也可以按照下面“基于hadoop环境安装使用kylin”中从step8之后的教程来创建并查询属于自己的model和cube。 @@ -105,11 +103,11 @@ CentOS 6.5+ 或Ubuntu 16.0.4+ #### step1、下载kylin压缩包 -从[Apache Kylin Download Site](https://kylin.apache.org/download/)下载一个适用于你的Hadoop版本的二进制文件。目前最新Release版本是kylin 3.0.1和kylin 2.6.5,其中3.0版本支持实时摄入数据进行预计算的功能。以CDH 5.的hadoop环境为例,可以使用如下命令行下载kylin 3.0.0: +从[Apache Kylin Download Site](https://kylin.apache.org/download/)下载一个适用于你的Hadoop版本的二进制文件。目前最新Release版本是kylin 3.1.0和kylin 2.6.6,其中3.0版本支持实时摄入数据进行预计算的功能。以CDH 5.的hadoop环境为例,可以使用如下命令行下载kylin 3.1.0: ``` cd /usr/local/ -wget http://apache.website-solution.net/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-cdh57.tar.gz +wget http://apache.website-solution.net/kylin/apache-kylin-3.1.0/apache-kylin-3.1.0-bin-cdh57.tar.gz ``` #### step2、解压kylin @@ -117,8 +115,8 @@ wget http://apache.website-solution.net/kylin/apache-kylin-3.0.0/apache-kylin-3. 解压下载得到的kylin压缩包,并配置环境变量KYLIN_HOME指向解压目录: ``` -tar -zxvf apache-kylin-3.0.0-bin-cdh57.tar.gz -cd apache-kylin-3.0.0-bin-cdh57 +tar -zxvf apache-kylin-3.1.0-bin-cdh57.tar.gz +cd apache-kylin-3.1.0-bin-cdh57 export KYLIN_HOME=`pwd` ``` @@ -160,7 +158,7 @@ $KYLIN_HOME/bin/kylin.sh start ``` A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' -Check the log at /usr/local/apache-kylin-3.0.0-bin-cdh57/logs/kylin.log +Check the log at /usr/local/apache-kylin-3.1.0-bin-cdh57/logs/kylin.log Web UI is at http://<hostname>:7070/kylin ``` diff --git a/website/_docs/gettingstarted/quickstart.md b/website/_docs/gettingstarted/quickstart.md index 161696e..27fe8dc 100644 --- a/website/_docs/gettingstarted/quickstart.md +++ b/website/_docs/gettingstarted/quickstart.md @@ -1,5 +1,5 @@ --- -layout: docs-cn +layout: docs title: Quick Start categories: start permalink: /docs/gettingstarted/kylin-quickstart.html @@ -14,15 +14,13 @@ Users can follow these steps to get an initial understanding of how to use Kylin In order to make it easy for users to try out Kylin, Zhu Weibin of Ant Financial has contributed “Kylin Docker Image” to the community. In this image, various services that Kylin depends on have been installed and deployed, including: -- Jdk 1.8 +- JDK 1.8 - Hadoop 2.7.0 - Hive 1.2.1 -- Hbase 1.1.2 +- Hbase 1.1.2 (with Zookeeper) - Spark 2.3.1 -- Zookeeper 3.4.6 - Kafka 1.1.1 -- Mysql -- Maven 3.6.1 +- MySQL 5.1.73 We have uploaded the user facing Kylin image to the Docker repository. Users do not need to build the image locally; they only need to install Docker to experience Kylin’s one-click installation. @@ -30,10 +28,10 @@ We have uploaded the user facing Kylin image to the Docker repository. Users do First, execute the following command to pull the image from the Docker repository: ``` -docker pull apachekylin/apache-kylin-standalone:3.0.1 +docker pull apachekylin/apache-kylin-standalone:3.1.0 ``` -The image here contains the latest version of Kylin: Kylin v3.0.1. This image contains all of the big data components that Kylin depends on, so it takes a long time to pull the image – please be patient. After the pull is successful, it is displayed as follows: +The image here contains the latest version of Kylin: Kylin v3.1.0. This image contains all of the big data components that Kylin depends on, so it takes a long time to pull the image – please be patient. After the pull is successful, it is displayed as follows:  @@ -49,7 +47,7 @@ docker run -d \ -p 8032:8032 \ -p 8042:8042 \ -p 16010:16010 \ -apachekylin/apache-kylin-standalone:3.0.1 +apachekylin/apache-kylin-standalone:3.1.0 ``` The container will start shortly. Since the specified port in the container has been mapped to the local port, you can directly open the pages of each service in the local browser, such as: @@ -68,13 +66,13 @@ When the container starts, the following services are automatically started: It will also automatically run $ KYLIN_HOME / bin / sample.sh and create a kylin_streaming_topic in Kafka and continue to send data to that topic to allow users to experience building and querying cubes in batches and streams as soon as the container is launched. Users can enter the container through the docker exec command. The relevant environment variables in the container are as follows: -- JAVA_HOME = / home / admin / jdk1.8.0_141 -- HADOOP_HOME = / home / admin / hadoop-2.7.0 -- KAFKA_HOME = / home / admin / kafka_2.11-1.1.1 -- SPARK_HOME = / home / admin / spark-2.3.1-bin-hadoop2.6 -- HBASE_HOME = / home / admin / hbase-1.1.2 -- HIVE_HOME = / home / admin / apache-hive-1.2.1-bin -- KYLIN_HOME = / home / admin / apache-kylin-3.0.0-alpha2-bin-hbase1x +- JAVA_HOME = /home/admin/jdk1.8.0_141 +- HADOOP_HOME = /home/admin/hadoop-2.7.0 +- KAFKA_HOME = /home/admin/kafka_2.11-1.1.1 +- SPARK_HOME = /home/admin/spark-2.3.1-bin-hadoop2.6 +- HBASE_HOME = /home/admin/hbase-1.1.2 +- HIVE_HOME = /home/admin/apache-hive-1.2.1-bin +- KYLIN_HOME = /home/admin/apache-kylin-3.1.0-bin-hbase1x After logging in to Kylin with user/password of ADMIN/KYLIN, users can use the sample cube to experience the construction and query of the cube, or they can create and query their own models and cubes by following the tutorial from Step 8 in “Install and Use Kylin Based on a Hadoop Environment” below. @@ -110,19 +108,19 @@ It is recommended to use an integrated Hadoop environment for Kylin installation When your environment meets the above prerequisites, you can install and start using Kylin. #### Step1. Download the Kylin Archive -Download a binary for your version of Hadoop from [Apache Kylin Download Site](https://kylin.apache.org/download/). Currently, the latest versions are Kylin 3.0.1 and Kylin 2.6.5, of which, version 3.0 supports the function of ingesting data in real time for pre-calculation. If your Hadoop environment is CDH 5.7, you can download Kylin 3.0.0 using the following command line: +Download a binary for your version of Hadoop from [Apache Kylin Download Site](https://kylin.apache.org/download/). Currently, the latest versions are Kylin 3.1.0 and Kylin 2.6.6, of which, version 3.0 supports the function of ingesting data in real time for pre-calculation. If your Hadoop environment is CDH 5.7, you can download Kylin 3.1.0 using the following command line: ``` cd /usr/local/ -wget http://apache.website-solution.net/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-cdh57.tar.gz +wget http://apache.website-solution.net/kylin/apache-kylin-3.1.0/apache-kylin-3.1.0-bin-cdh57.tar.gz ``` #### Step2. Extract Kylin Extract the downloaded Kylin archive and configure the environment variable KYLIN_HOME to point to the extracted directory: ``` -tar -zxvf apache-kylin-3.0.0-bin-cdh57.tar.gz -cd apache-kylin-3.0.0-bin-cdh57 +tar -zxvf apache-kylin-3.1.0-bin-cdh57.tar.gz +cd apache-kylin-3.1.0-bin-cdh57 export KYLIN_HOME=`pwd` ``` @@ -157,7 +155,7 @@ Start script to start Kylin. If the startup is successful, the following will be ``` A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' -Check the log at /usr/local/apache-kylin-3.0.0-bin-cdh57/logs/kylin.log +Check the log at /usr/local/apache-kylin-3.1.0-bin-cdh57/logs/kylin.log Web UI is at http://<hostname>:7070/kylin ``` diff --git a/website/_docs/howto/howto_use_mr_hive_dict.md b/website/_docs/howto/howto_use_mr_hive_dict.md index b9f5c96..bfaf483 100644 --- a/website/_docs/howto/howto_use_mr_hive_dict.md +++ b/website/_docs/howto/howto_use_mr_hive_dict.md @@ -8,11 +8,12 @@ permalink: /docs/howto/howto_use_hive_mr_dict.html ## Global Dictionary in Hive ### Background -Count distinct(bitmap) measure is very important for many scenario, such as PageView statistics, and Kylin support count distinct since 1.5.3 . -Apache Kylin implements precisely count distinct measure based on bitmap, and use global dictionary to encode string value into integer. -Currently we have to build global dictionary in single process/JVM, which may take a lot of time and memory for UHC. -Kylin v3.0.0 introduce Hive global dictionary v1(KYLIN-3841). By this feature, we use Hive, a distributed SQL engine to build global dictionary. -For improve performance, kylin v3.1.0 use MapReduce replace HQL in some steps, introduce Hive global dictionary v2(KYLIN-4342). + +- Count distinct(bitmap) measure is very important for many scenario, such as PageView statistics, and Kylin support count distinct since 1.5.3 . +- Apache Kylin implements precisely count distinct measure based on bitmap, and use global dictionary to encode string value into integer. +- Currently we have to build global dictionary in single process/JVM, which may take a lot of time and memory for UHC. +- Kylin v3.0.0 introduce Hive global dictionary v1(KYLIN-3841). By this feature, we use Hive, a distributed SQL engine to build global dictionary. +- For improve performance, kylin v3.1.0 use MapReduce replace HQL in some steps, introduce Hive global dictionary v2(KYLIN-4342). ### Benefit Summary 1.Build Global Dictionary in distributed way, thus building job spent less time. diff --git a/website/_docs/install/kylin_docker.cn.md b/website/_docs/install/kylin_docker.cn.md index 9d63554..e0bf234 100644 --- a/website/_docs/install/kylin_docker.cn.md +++ b/website/_docs/install/kylin_docker.cn.md @@ -8,7 +8,7 @@ since: v3.0.0 为了让用户方便的试用 Kylin,以及方便开发者在修改了源码后进行验证及调试。我们提供了 Kylin 的 docker 镜像。该镜像中,Kylin 依赖的各个服务均已正确的安装及部署,包括: -- Jdk 1.8 +- JDK 1.8 - Hadoop 2.7.0 - Hive 1.2.1 - Hbase 1.1.2 (with Zookeeper) diff --git a/website/_docs/install/kylin_docker.md b/website/_docs/install/kylin_docker.md index 3521b77..aba06ec 100644 --- a/website/_docs/install/kylin_docker.md +++ b/website/_docs/install/kylin_docker.md @@ -8,7 +8,7 @@ since: v3.0.0 In order to allow users to easily try Kylin, and to facilitate developers to verify and debug after modifying the source code. We provide Kylin's docker image. In this image, each service that Kylin relies on is properly installed and deployed, including: -- Jdk 1.8 +- JDK 1.8 - Hadoop 2.7.0 - Hive 1.2.1 - Hbase 1.1.2 (with Zookeeper) diff --git a/website/_docs/tutorial/cube_migration.cn.md b/website/_docs/tutorial/cube_migration.cn.md new file mode 100644 index 0000000..d9cde69 --- /dev/null +++ b/website/_docs/tutorial/cube_migration.cn.md @@ -0,0 +1,164 @@ +--- +layout: docs-cn +title: "Cube 迁移" +categories: 教程 +permalink: /cn/docs/tutorial/cube_migration.html +since: v3.1.0 +--- + +Cube迁移功能主要用于把QA环境下的Cube迁移到PROD环境下,Kylin v3.1.0对这个功能进行了加强,加强的功能列表如下: + +- 在迁移前,Kylin会使用内部定义的一些规则对Cube的质量及兼容性做校验,之前的版本则需要人工去校验; +- 通过邮件的方式发送迁移请求及迁移结果通知,取代之前的人工沟通; +- 支持跨Hadoop集群的迁移功能; + +## I. 在同一个Hadoop集群下的Cube迁移 + +提供如下两种方式来迁移同一个Hadoop集群下的Cube: + +- 使用Kylin portal; +- 使用工具类'CubeMigrationCLI.java'; + +### 1. 迁移的前置条件 + +1. Cube迁移的操作按钮只有Cube的管理员才可见。 +2. 在迁移前,必须对要迁移的Cube进行构建,确认查询性能,Cube的状态必须是**READY**。 +3. 配置项'**kylin.cube.migration.enabled**'必须是true。 +4. 确保Cube要迁移的目标项目(PROD环境下)必须存在。 +5. QA环境和PROD环境必须在同一个Hadoop集群下, 即具有相同的 HDFS, HBase and HIVE等。 + +### 2. 通过Web界面进行Cube迁移的步骤 + +首先,要确保有操作Cube的权限。 + +#### 步骤 1 +在QA环境里的 'Model' 页面,点击'Actions'列中的'Action'下拉列表,选择'Migrate'操作: + +  + +#### 步骤 2 +在点击'Migrate'按钮后, 将会出现一个弹出框: + +  + +#### 步骤 3 +在弹出框中输入PROD环境的目标项目名称,使用QA环境的项目名称作为默认值。 + +#### 步骤 4 +在弹出框中点击'Validate'按钮,将会在后端对迁移的Cube做一些验证,待验证完毕,会出现验证结果的弹出框。 + + **验证异常及解决方法** + + - `The target project XXX does not exist on PROD-KYLIN-INSTANCE:7070`: 输入的PROD环境的目标项目名称必须存在。 + + - `Cube email notification list is not set or empty`: 要迁移的Cube的邮件通知列表不能为空。 + + **建议性提示** + + - `Auto merge time range for cube XXXX is not set`: 建议设置Cube的配置项:'Auto Merge Threshold'。 + - `ExpansionRateRule: failed on expansion rate check with exceeding 5`: Cube的膨胀率超过配置项'kylin.cube.migration.expansion-rate'配置的值,可以设置为一个合理的值。 + - `Failed on query latency check with average cost 5617 exceeding 2000ms`: 如果设置配置项'kylin.cube.migration.rule-query-latency-enabled'为true, 在验证阶段后端会自动生成一些SQL来测试Cube的查询性能,可以合理设置配置项'kylin.cube.migration.query-latency-seconds'的值。 + +#### 步骤 5 + +待验证通过,点击'Submit'按钮发起Cube迁移请求给Cube的管理员。后端会自动发送请求邮件给Cube管理员: + +  + +#### 步骤 6 +Cube管理员在接收到Cube迁移请求邮件后,可以通过'Model'页面里'Admins'列的'Action'下拉列表,选择'Approve Migration'操作还是'Reject Migration'操作,同时后端会自动发送请求结果邮件给请求者: + +  + +#### 步骤 7 +如果Cube管理员选择'Approve Migration',将会出现如下弹出框: + +  + +在弹出框输入正确的目标项目名称,点击'Approve'按钮,后端开始迁移Cube。 + +#### 步骤 8 +迁移Cube成功后,将会出现如下弹出框,显示迁移成功: + +  + +#### 步骤 9 +最后, 在PROD环境下的'Model'页面,迁移的Cube会出现在列表中,且状态是**DISABLED**。 + +### 3. 使用'CubeMigrationCLI.java'工具类进行迁移 + +#### 作用 +CubeMigrationCLI.java 用于迁移 cubes。例如:将 cube 从测试环境迁移到生产环境。请注意,不同的环境是共享相同的 Hadoop 集群,包括 HDFS,HBase 和 HIVE。此 CLI 不支持跨 Hadoop 集群的数据迁移。 + +#### 如何使用 + +前八个参数必须有且次序不能改变。 +{% highlight Groff markup %} +./bin/kylin.sh org.apache.kylin.tool.CubeMigrationCLI <srcKylinConfigUri> <dstKylinConfigUri> <cubeName> <projectName> <copyAclOrNot> <purgeOrNot> <overwriteIfExists> <realExecute> <migrateSegmentOrNot> +{% endhighlight %} +例如: +{% highlight Groff markup %} +./bin/kylin.sh org.apache.kylin.tool.CubeMigrationCLI kylin-qa:7070 kylin-prod:7070 kylin_sales_cube learn_kylin true false false true false +{% endhighlight %} +命令执行成功后,请 reload metadata,您想要迁移的 cube 将会存在于迁移后的 project 中。 + +下面会列出所有支持的参数: +- 如果您使用 `cubeName` 这个参数,但想要迁移的 cube 所对应的 model 在要迁移的环境中不存在,model 的数据也会迁移过去。 +- 如果您将 `overwriteIfExists` 设置为 false,且该 cube 已存在于要迁移的环境中,当您运行命令,cube 存在的提示信息将会出现。 +- 如果您将 `migrateSegmentOrNot` 设置为 true,请保证 Kylin metadata 的 HDFS 目录存在且 Cube 的状态为 READY。 + +| Parameter | Description | +| ------------------- | :----------------------------------------------------------------------------------------- | +| srcKylinConfigUri | The URL of the source environment's Kylin configuration. It can be `host:7070`, or an absolute file path to the `kylin.properties`. | +| dstKylinConfigUri | The URL of the target environment's Kylin configuration. | +| cubeName | the name of Cube to be migrated.(Make sure it exist) | +| projectName | The target project in the target environment.(Make sure it exist) | +| copyAclOrNot | `true` or `false`: whether copy Cube ACL to target environment. | +| purgeOrNot | `true` or `false`: whether purge the Cube from src server after the migration. | +| overwriteIfExists | `true` or `false`: overwrite cube if it already exists in the target environment. | +| realExecute | `true` or `false`: if false, just print the operations to take, if true, do the real migration. | +| migrateSegmentOrNot | (Optional) true or false: whether copy segment data to target environment. Default true. | + +## II. 跨Hadoop集群下的Cube迁移 + +**注意**: + +- 当前只支持使用工具类'CubeMigrationCrossClusterCLI.java'来进行跨Hadoop集群下的Cube迁移。 +- 跨Hadoop集群的Cube迁移,支持同时把Cube数据从QA环境迁移到PROD环境。 + + +### 1. 迁移的前置条件 +1. 在迁移前,必须对要迁移的Cube进行构建Segment,确认查询性能,Cube的状态必须是**READY**。 +2. PROD环境下的目标项目名称必须和QA环境下的项目名称一致。 + +### 2. 如何使用工具类'CubeMigrationCrossClusterCLI.java'来迁移Cube + +{% highlight Groff markup %} +./bin/kylin.sh org.apache.kylin.tool.migration.CubeMigrationCrossClusterCLI <kylinUriSrc> <kylinUriDst> <updateMappingPath> <cube> <hybrid> <project> <all> <dstHiveCheck> <overwrite> <schemaOnly> <execute> <coprocessorPath> <codeOfFSHAEnabled> <distCpJobQueue> <distCpJobMemory> <nThread> +{% endhighlight %} +例如: +{% highlight Groff markup %} +./bin/kylin.sh org.apache.kylin.tool.migration.CubeMigrationCrossClusterCLI -kylinUriSrc ADMIN:ky...@qa.env:17070 -kylinUriDst ADMIN:ky...@prod.env:17777 -cube kylin_sales_cube -updateMappingPath $KYLIN_HOME/updateTableMapping.json -execute true -schemaOnly false -overwrite true +{% endhighlight %} +命令执行成功后,在PROD环境下的'Model'页面,迁移的Cube会出现在列表中,且状态是**READY**。 + +下面会列出所有支持的参数: + +| Parameter | Description | +| ------------------- | :----------------------------------------------------------------------------------------- | +| kylinUriSrc | (Required) The source kylin uri with format user:pwd@host:port. | +| kylinUriDst | (Required) The target kylin uri with format user:pwd@host:port. | +| updateMappingPath | (Optional) The path for the update Hive table mapping file, the format is json. | +| cube | The cubes which you want to migrate, separated by ','. | +| hybrid | The hybrids which you want to migrate, separated by ','. | +| project | The projects which you want to migrate, separated by ','. | +| all | Migrate all projects. **Note**: You must add only one of above four parameters: 'cube', 'hybrid', 'project' or 'all'. | +| dstHiveCheck | (Optional) Whether to check target hive tables, the default value is true. | +| overwrite | (Optional) Whether to overwrite existing cubes, the default value is false. | +| schemaOnly | (Optional) Whether only migrate cube related schema, the default value is true. **Note**: If set to false, it will migrate cube data too. | +| execute | (Optional) Whether it's to execute the migration, the default value is false. | +| coprocessorPath | (Optional) The path of coprocessor to be deployed, the default value is get from KylinConfigBase.getCoprocessorLocalJar(). | +| codeOfFSHAEnabled | (Optional) Whether to enable the namenode ha of clusters. | +| distCpJobQueue | (Optional) The mapreduce.job.queuename for DistCp job. | +| distCpJobMemory | (Optional) The mapreduce.map.memory.mb for DistCp job. | +| nThread | (Optional) The number of threads for migrating cube data in parallel. | diff --git a/website/_docs/tutorial/cube_migration.md b/website/_docs/tutorial/cube_migration.md index e580fed..cc4ca67 100644 --- a/website/_docs/tutorial/cube_migration.md +++ b/website/_docs/tutorial/cube_migration.md @@ -6,17 +6,29 @@ permalink: /docs/tutorial/cube_migration.html since: v3.1.0 --- -## Migrate on the same Hadoop cluster +Cube migration is used to migrate cube from QA env to PROD env, Kylin v3.1.0 enhances this feature, the list of enhanced functions is showed as below: -### Pre-requisites to use cube migration +- Use some internal rules to check the quality and compatibility of cube by Kylin before migration, instead of checking manually; +- Use email to send cube migration request and result notification; +- Support to migrate across two Hadoop cluster; + +## I. Migrate on the same Hadoop cluster + +There are two ways to migrate cube from QA env to PROD env on the same Hadoop cluster: + +- Use the Kylin portal; +- Use 'CubeMigrationCLI.java' CLI; + +### 1. Pre-requisitions to use cube migration 1. Only cube admin can migrate the cubes as the "migrate" button is **ONLY** visible to cube admin. -2. The cube status must be **ready** before migration which you have built the segment and confirmed the performance. +2. The cube status must be **READY** before migration which you have built the segment and confirmed the performance. 3. The Property '**kylin.cube.migration.enabled**' must be true. 4. The target project must exist on Kylin PROD env before migration. 5. The QA env and PROD env must share the same Hadoop cluster, including HDFS, HBase and HIVE. -### Steps to migrate a cube through the Kylin portal +### 2. Steps to migrate a cube through the Kylin portal + First of all, make sure that you have authority of the cube you want to migrate. #### Step 1 @@ -33,7 +45,7 @@ After you click 'Migrate' button, you will see a pop-up window: Check if the target project name is what you want. It uses the same project name on QA env as default target project name. If the target project name is different on PROD env, please replace with the correct one. #### Step 4 -Click 'Validate' button to verify the cube validity. It may take couple of minutes to validate the cube on the backend and show the validity results on a pop-up window: +Click 'Validate' button to verify the cube validity. It may take couple of minutes to validate the cube on the backend and show the validity results on a pop-up window. **Common exceptions and suggested solutions** @@ -54,7 +66,7 @@ If validations are ok, click 'Submit' button to send the migration request email  #### Step 6 -Cubes administrator will receive a migration request email, and can click the 'Action' drop down button in the 'Actions' column and select operation 'Approve Migration' button to migrate cube or select 'Reject Migration' button to reject request. It also will send a notification email to the migration requester: +Cubes administrator will receive a migration request email, and can click the 'Action' drop down button in the 'Admins' column and select operation 'Approve Migration' button to migrate cube or select 'Reject Migration' button to reject request. It also will send a notification email to the migration requester:  @@ -73,7 +85,7 @@ If migrate successfully, it will show the message below: #### Step 9 Finally, go to Kylin portal on PROD env, and refresh the 'Model' page, you will see the cube you migrated from QA env and the status of this cube is **DISABLED**. -### Use 'CubeMigrationCLI.java' CLI to migrate cube +### 3. Use 'CubeMigrationCLI.java' CLI to migrate cube #### Function CubeMigrationCLI.java can migrate a cube from a Kylin environment to another, for example, promote a well tested cube from the testing env to production env. Note that the different Kylin environments should share the same Hadoop cluster, including HDFS, HBase and HIVE. @@ -92,9 +104,10 @@ For example: After the command is successfully executed, please reload Kylin metadata, the cube you want to migrate will appear in the target environment. All supported parameters are listed below: - If the data model of the cube you want to migrate does not exist in the target environment, this tool will also migrate the model. - If you set `overwriteIfExists` to `false`, and the cube exists in the target environment, the tool will stop to proceed. - If you set `migrateSegmentOrNot` to `true`, please make sure the cube has `READY` segments, they will be migrated to target environment together. + +- If the data model of the cube you want to migrate does not exist in the target environment, this tool will also migrate the model. +- If you set `overwriteIfExists` to `false`, and the cube exists in the target environment, the tool will stop to proceed. +- If you set `migrateSegmentOrNot` to `true`, please make sure the cube has `READY` segments, they will be migrated to target environment together. | Parameter | Description | | ------------------- | :----------------------------------------------------------------------------------------- | @@ -108,15 +121,18 @@ All supported parameters are listed below: | realExecute | `true` or `false`: If false, just print the operations to take (dry-run mode); if true, do the real migration. | | migrateSegmentOrNot | (Optional) `true` or `false`: whether copy segment info to the target environment. Default true. | -## Migrate across two Hadoop clusters +## II. Migrate across two Hadoop clusters -**Note**: Currently it just supports to use 'CubeMigrationCrossClusterCLI.java' CLI to migrate cube across two Hadoop clusters. +**Note**: -### Pre-requisitions to use cube migration -1. The cube status must be **ready** before migration which you have built the segment and confirmed the performance. +- Currently it only supports to use 'CubeMigrationCrossClusterCLI.java' CLI to migrate cube across two Hadoop clusters. +- Support to migrate cube data (segments data on HBase) from QA env to PROD env. + +### 1. Pre-requisitions to use cube migration +1. The cube status must be **READY** before migration which you have built the segment and confirmed the performance. 2. The target project name of PROD env must be the same as the one on QA env. -### How to use 'CubeMigrationCrossClusterCLI.java' CLI to migrate cube +### 2. How to use 'CubeMigrationCrossClusterCLI.java' CLI to migrate cube {% highlight Groff markup %} ./bin/kylin.sh org.apache.kylin.tool.migration.CubeMigrationCrossClusterCLI <kylinUriSrc> <kylinUriDst> <updateMappingPath> <cube> <hybrid> <project> <all> <dstHiveCheck> <overwrite> <schemaOnly> <execute> <coprocessorPath> <codeOfFSHAEnabled> <distCpJobQueue> <distCpJobMemory> <nThread> @@ -140,14 +156,10 @@ All supported parameters are listed below: | all | Migrate all projects. **Note**: You must add only one of above four parameters: 'cube', 'hybrid', 'project' or 'all'. | | dstHiveCheck | (Optional) Whether to check target hive tables, the default value is true. | | overwrite | (Optional) Whether to overwrite existing cubes, the default value is false. | -| schemaOnly | (Optional) Whether only migrate cube related schema, the default value is true. | +| schemaOnly | (Optional) Whether only migrate cube related schema, the default value is true. **Note**: If set to false, it will migrate cube data too. | | execute | (Optional) Whether it's to execute the migration, the default value is false. | | coprocessorPath | (Optional) The path of coprocessor to be deployed, the default value is get from KylinConfigBase.getCoprocessorLocalJar(). | | codeOfFSHAEnabled | (Optional) Whether to enable the namenode ha of clusters. | | distCpJobQueue | (Optional) The mapreduce.job.queuename for DistCp job. | | distCpJobMemory | (Optional) The mapreduce.map.memory.mb for DistCp job. | | nThread | (Optional) The number of threads for migrating cube data in parallel. | - - - - diff --git a/website/images/docs/quickstart/pull_docker.png b/website/images/docs/quickstart/pull_docker.png index 5b5a88c..16236e4 100644 Binary files a/website/images/docs/quickstart/pull_docker.png and b/website/images/docs/quickstart/pull_docker.png differ