This is an automated email from the ASF dual-hosted git repository. xxyu pushed a commit to branch document in repository https://gitbox.apache.org/repos/asf/kylin.git
commit 89f950a93a57a7e293b7ccd056a83bb1821de039 Author: yaqian.zhang <598593...@qq.com> AuthorDate: Wed Jul 1 19:13:52 2020 +0800 Update hive global dictionary doc and some doc update for kylin 3.1.0 --- website/_data/docs31-cn.yml | 3 +- website/_data/docs31.yml | 6 + website/_docs31/gettingstarted/quickstart.cn.md | 301 ++++++ website/_docs31/gettingstarted/quickstart.md | 303 ++++++ website/_docs31/howto/howto_use_mr_hive_dict.cn.md | 65 ++ website/_docs31/howto/howto_use_mr_hive_dict.md | 67 ++ website/_docs31/howto/howto_use_restapi.cn.md | 1064 ++++++++++++++----- website/_docs31/howto/howto_use_restapi.md | 1077 +++++++++++++++----- website/_docs31/security.md | 76 ++ .../lambda_mode_and_timezone_realtime_olap.md | 175 ++++ .../Hive-Global-Dictionary/add-count-distinct.png | Bin 0 -> 151385 bytes .../Hive-Global-Dictionary/cube-level-config.png | Bin 189891 -> 0 bytes .../hive-global-dict-table.png | Bin 39817 -> 0 bytes .../Hive-Global-Dictionary/new-added-step-1.png | Bin 0 -> 57868 bytes .../Hive-Global-Dictionary/new-added-step-2.png | Bin 0 -> 76792 bytes .../set-hive-dict-cloumn.png | Bin 0 -> 133281 bytes .../set-hive-dict-column.png | Bin 134436 -> 0 bytes .../Hive-Global-Dictionary/three-added-steps.png | Bin 185211 -> 0 bytes 18 files changed, 2653 insertions(+), 484 deletions(-) diff --git a/website/_data/docs31-cn.yml b/website/_data/docs31-cn.yml index 8295b94..f647a3d 100644 --- a/website/_data/docs31-cn.yml +++ b/website/_data/docs31-cn.yml @@ -16,6 +16,7 @@ docs: - index - gettingstarted/faq + - gettingstarted/kylin-quickstart - title: 安装 docs: @@ -70,4 +71,4 @@ - howto/howto_backup_metadata - howto/howto_cleanup_storage - howto/howto_use_cli - + - howto/howto_use_hive_mr_dict diff --git a/website/_data/docs31.yml b/website/_data/docs31.yml index f25d092..28ff721 100644 --- a/website/_data/docs31.yml +++ b/website/_data/docs31.yml @@ -24,6 +24,7 @@ - gettingstarted/faq - gettingstarted/events - gettingstarted/best_practices + - gettingstarted/kylin-quickstart - title: Installation docs: @@ -55,6 +56,7 @@ - tutorial/setup_jdbc_datasource - tutorial/hybrid - tutorial/mysql_metastore + - tutorial/lambda_mode_and_timezone_realtime_olap - title: Integration docs: @@ -88,3 +90,7 @@ - howto/howto_enable_zookeeper_acl - howto/howto_use_health_check_cli - howto/howto_use_hive_mr_dict + +- title: Security + docs: + - security \ No newline at end of file diff --git a/website/_docs31/gettingstarted/quickstart.cn.md b/website/_docs31/gettingstarted/quickstart.cn.md new file mode 100644 index 0000000..01e7b4b --- /dev/null +++ b/website/_docs31/gettingstarted/quickstart.cn.md @@ -0,0 +1,301 @@ +--- +layout: docs-cn +title: 快速开始 +categories: 开始 +permalink: /cn/docs31/gettingstarted/kylin-quickstart.html +since: v0.6.x +--- + +这里是一份从下载安装到体验亚秒级查询的完整流程,分别介绍了有hadoop环境(基于hadoop环境的安装)和没有hadoop环境(从docker镜像安装)两种场景下kylin的安装使用,用户可以根据自己的环境选择其中的任意一种方式。 +你可以按照文章里的步骤对kylin进行初步的了解和体验,掌握kylin的基本使用技能,然后结合自己的业务场景使用kylin来设计模型,加速查询。 + + + +### 一、 从docker镜像安装使用kylin(不需要提前准备hadoop环境) +为了让用户方便的试用 Kylin,我们提供了 Kylin 的 docker 镜像。该镜像中,Kylin 依赖的各个服务均已正确的安装及部署,包括: + +- jdk 1.8 +- Hadoop 2.7.0 +- Hive 1.2.1 +- Hbase 1.1.2 +- Spark 2.3.1 +- Zookeeper 3.4.6 +- Kafka 1.1.1 +- Mysql +- Maven 3.6.1 + +我们已将面向用户的 Kylin 镜像上传至 docker 仓库,用户无需在本地构建镜像,只需要安装docker,就可以体验kylin的一键安装。 + +#### step1、首先执行以下命令从 docker 仓库 pull 镜像: + +``` +docker pull apachekylin/apache-kylin-standalone:3.0.1 +``` + +此处的镜像包含的是kylin最新Release版本kylin 3.0.1。由于该镜像中包含了所有kylin依赖的大数据组件,所以拉取镜像需要的时间较长,请耐心等待。Pull成功后显示如下: + + +#### step2、执行以下命令来启动容器: + +``` +docker run -d \ +-m 8G \ +-p 7070:7070 \ +-p 8088:8088 \ +-p 50070:50070 \ +-p 8032:8032 \ +-p 8042:8042 \ +-p 16010:16010 \ +apachekylin/apache-kylin-standalone:3.0.1 +``` + +容器会很快启动,由于容器内指定端口已经映射到本机端口,可以直接在本机浏览器中打开各个服务的页面,如: + +- Kylin 页面:http://127.0.0.1:7070/kylin/ +- Hdfs NameNode 页面:http://127.0.0.1:50070 +- Yarn ResourceManager 页面:http://127.0.0.1:8088 +- HBase 页面:http://127.0.0.1:60010 + +容器启动时,会自动启动以下服务: + +- NameNode, DataNode +- ResourceManager, NodeManager +- HBase +- Kafka +- Kylin + +并自动运行 $KYLIN_HOME/bin/sample.sh及在 Kafka 中创建 kylin_streaming_topic topic 并持续向该 topic 中发送数据。这是为了让用户启动容器后,就能体验以批和流的方式的方式构建 Cube 并进行查询。 +用户可以通过docker exec命令进入容器,容器内相关环境变量如下: + +``` +JAVA_HOME=/home/admin/jdk1.8.0_141 +HADOOP_HOME=/home/admin/hadoop-2.7.0 +KAFKA_HOME=/home/admin/kafka_2.11-1.1.1 +SPARK_HOME=/home/admin/spark-2.3.1-bin-hadoop2.6 +HBASE_HOME=/home/admin/hbase-1.1.2 +HIVE_HOME=/home/admin/apache-hive-1.2.1-bin +KYLIN_HOME=/home/admin/apache-kylin-3.0.0-alpha2-bin-hbase1x +``` + +使用ADMIN/KYLIN的用户名和密码组合登陆Kylin后,用户可以使用sample cube来体验cube的构建和查询,也可以按照下面“基于hadoop环境安装使用kylin”中从step8之后的教程来创建并查询属于自己的model和cube。 + +### 二、 基于hadoop环境安装使用kylin +对于已经有稳定hadoop环境的用户,可以下载kylin的二进制包将其部署安装在自己的hadoop集群。安装之前请根据以下要求进行环境检查: + +- 前置条件: +Kylin 依赖于 Hadoop 集群处理大量的数据集。您需要准备一个配置好 HDFS,YARN,MapReduce,Hive, HBase,Zookeeper 和其他服务的 Hadoop 集群供 Kylin 运行。 +Kylin 可以在 Hadoop 集群的任意节点上启动。方便起见,您可以在 master 节点上运行 Kylin。但为了更好的稳定性,我们建议您将 Kylin 部署在一个干净的 Hadoop client 节点上,该节点上 Hive,HBase,HDFS 等命令行已安装好且 client 配置(如 core-site.xml,hive-site.xml,hbase-site.xml及其他)也已经合理的配置且其可以自动和其它节点同步。 +运行 Kylin 的 Linux 账户要有访问 Hadoop 集群的权限,包括创建/写入 HDFS 文件夹,Hive 表, HBase 表和提交 MapReduce 任务的权限。 + +- 硬件要求: +运行 Kylin 的服务器建议最低配置为 4 core CPU,16 GB 内存和 100 GB 磁盘。 + +- 操作系统要求: +CentOS 6.5+ 或Ubuntu 16.0.4+ + +- 软件要求: + - Hadoop 2.7+,3.0 + - Hive 0.13+,1.2.1+ + - Hbase 1.1+,2.0(从kylin 2.5开始支持) + - JDK: 1.8+ + +建议使用集成的Hadoop环境进行kylin的安装与测试,比如Hortonworks HDP 或Cloudera CDH ,kylin发布前在 Hortonworks HDP 2.2-2.6 and 3.0, Cloudera CDH 5.7-5.11 and 6.0, AWS EMR 5.7-5.10, Azure HDInsight 3.5-3.6上测试通过。 + +当你的环境满足上述前置条件时 ,你可以开始安装使用kylin。 + +#### step1、下载kylin压缩包 + +从[Apache Kylin Download Site](https://kylin.apache.org/download/)下载一个适用于你的Hadoop版本的二进制文件。目前最新Release版本是kylin 3.0.1和kylin 2.6.5,其中3.0版本支持实时摄入数据进行预计算的功能。以CDH 5.的hadoop环境为例,可以使用如下命令行下载kylin 3.0.0: + +``` +cd /usr/local/ +wget http://apache.website-solution.net/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-cdh57.tar.gz +``` + +#### step2、解压kylin + +解压下载得到的kylin压缩包,并配置环境变量KYLIN_HOME指向解压目录: + +``` +tar -zxvf apache-kylin-3.0.0-bin-cdh57.tar.gz +cd apache-kylin-3.0.0-bin-cdh57 +export KYLIN_HOME=`pwd` +``` + +#### step3、下载SPARK + +由于kylin启动时会对SPARK环境进行检查,所以你需要设置SPARK_HOME指向自己的spark安装路径: + +``` +export SPARK_HOME=/path/to/spark +``` + +如果您没有已经下载好的Spark环境,也可以使用kylin自带脚本下载spark: + +``` +$KYLIN_HOME/bin/download-spark.sh +``` + +脚本会将解压好的spark放在$KYLIN_HOME目录下,如果系统中没有设置SPARK_HOME,启动kylin时会自动找到$KYLIN_HOME目录下的spark。 + +#### step4、环境检查 + +Kylin 运行在 Hadoop 集群上,对各个组件的版本、访问权限及 CLASSPATH 等都有一定的要求,为了避免遇到各种环境问题,您可以执行 + +``` +$KYLIN_HOME/bin/check-env.sh +``` + +来进行环境检测,如果您的环境存在任何的问题,脚本将打印出详细报错信息。如果没有报错信息,代表您的环境适合 Kylin 运行。 + +#### step5、启动kylin + +运行如下命令来启动kylin: + +``` +$KYLIN_HOME/bin/kylin.sh start +``` + +如果启动成功,命令行的末尾会输出如下内容: + +``` +A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' +Check the log at /usr/local/apache-kylin-3.0.0-bin-cdh57/logs/kylin.log +Web UI is at http://<hostname>:7070/kylin +``` + +#### step6、访问kylin + +Kylin 启动后您可以通过浏览器 http://<hostname>:port/kylin 进行访问。 +其中 <hostname> 为具体的机器名、IP 地址或域名,port为kylin端口,默认为7070。 +初始用户名和密码是 ADMIN/KYLIN。 +服务器启动后,可以通过查看 $KYLIN_HOME/logs/kylin.log 获得运行时日志。 + +#### step7、创建Sample Cube + +Kylin提供了一个创建样例Cube的脚本,以供用户快速体验Kylin。 +在命令行运行: + +``` +$KYLIN_HOME/bin/sample.sh +``` + +完成后登陆kylin,点击System->Configuration->Reload Metadata来重载元数据 +元数据重载完成后你可以在左上角的Project中看到一个名为learn_kylin的项目,它包含kylin_sales_cube和kylin_streaming_cube, 它们分别为batch cube和streaming cube,你可以直接对kylin_sales_cube进行构建,构建完成后就可以查询。 +对于kylin_streaming_cube,需要设置KAFKA_HOME指向你的kafka安装目录: + +``` +export KAFKA_HOME=/path/to/kafka +``` + +然后执行 + +``` +${KYLIN_HOME}/bin/sample-streaming.sh +``` + +该脚本会在 localhost:9092 broker 中创建名为 kylin_streaming_topic 的 Kafka Topic,它也会每秒随机发送 100 条 messages 到 kylin_streaming_topic,然后你可以对kylin_streaming_cube进行构建。 + +关于sample cube,可以参考[Sample Cube](/cn/docs/tutorial/kylin_sample.html)。 + +当然,你也可以根据下面的教程来尝试创建自己的Cube。 + +#### step8、创建project + +登陆kylin后,点击左上角的+号来创建Project: + + + +#### step9、加载Hive表 + +点击Model->Data Source->Load Table From Tree, +Kylin会读取到Hive数据源中的表并以树状方式显示出来,你可以选择自己要使用的表,然后点击sync进行将其加载到kylin。 + + + +#### step10、创建模型 + +点击Model->New->New Model: + + + +输入Model Name点击Next进行下一步,选择Fact Table和Lookup Table,添加Lookup Table时需要设置与事实表的JOIN条件。 + + + +然后点击Next到下一步添加Dimension: + + + +点击Next下一步添加Measure: + + + +点击Next下一步跳转到设置时间分区列和过滤条件页面,时间分区列用于增量构建时选择时间范围,如果不设置时间分区列则代表该model下的cube都是全量构建。过滤条件会在打平表时用于where条件。 + + + +最后点击Save保存模型。 + +#### step11、创建Cube + +选择Model->New->New Cube + + + +点击Next到下一步添加Dimension,Lookup Table的维度可以设置为Normal(普通维度)或者Derived(衍生维度)两种类型,默认设置为衍生维度,衍生维度代表该列可以从所属维度表的主键中衍生出来,所以实际上只有主键列会被Cube加入计算。 + + + +点击Next到下一步,点击+Measure来添加需要预计算的度量。Kylin会默认创建一个Count(1)的度量。Kylin支持SUM、MIN、MAX、COUNT、COUNT_DISTINCT、TOP_N、EXTENDED_COLUMN、PERCENTILE八种度量。请为COUNT_DISTINCT和TOP_N选择合适的返回类型,这关系到Cube的大小。添加完成之后点击ok,该Measure将会显示在Measures列表中 + + + +添加完所有Measure后点击Next进行下一步,这一页是关于Cube数据刷新的设置。在这里可以设施自动合并的阈值(Auto Merge Thresholds)、数据保留的最短时间(Retention Threshold)以及第一个Segment的起点时间。 + + + +点击Next跳转到下一页高级设置。在这里可以设置聚合组、RowKeys、Mandatory Cuboids、Cube Engine等。 + +关于高级设置的详细信息,可以参考[create_cube](/cn/docs/tutorial/create_cube.html) 页面中的步骤5,其中对聚合组等设置进行了详细介绍。 + +关于更多维度优化,可以阅读[aggregation-group](/blog/2016/02/18/new-aggregation-group/)。 + + + +对于高级设置不是很熟悉时可以先保持默认设置,点击Next跳转到Kylin Properties页面,你可以在这里重写cube级别的kylin配置项,定义覆盖的属性,配置项请参考[配置项](/cn/docs/install/configuration.html)。 + + + +配置完成后,点击Next按钮到下一页,这里可以预览你正在创建的Cube的基本信息,并且可以返回之前的步骤进行修改。如果没有需要修改的部分,就可以点击Save按钮完成Cube创建。之后,这个Cube将会出现在你的Cube列表中。 + + + +#### step12、构建Cube + +上一个步骤创建好的Cube只有定义,而没有计算好的数据,它的状态是‘DISABLED’,是不可以查询的。要想让Cube有数据,还需要对它进行构建。 + +Cube的构建方式通常有两种:全量构建和增量构建。 + + +点击要构建的Cube的Actions列下的Action展开,选择Build,如果Cube所属Model中没有设置时间分区列,则默认全量构建,点击Submit直接提交构建任务。 + +如果设置了时间分区列,则会出现如下页面,在这里你要选择构建数据的起止时间: + + + +设置好起止时间后,点击Submit提交构建任务。然后你可以在Monitor页面观察构建任务的状态。Kylin会在页面上显示每一个步骤的运行状态、输出日志以及MapReduce任务。可以在${KYLIN_HOME}/logs/kylin.log中查看更详细的日志信息。 + + + +任务构建完成后,Cube状态会变成READY,并且可以看到Segment的信息。 + + + +#### step13、查询Cube + +Cube构建完成后,在Insight页面的Tables列表下面可以看到构建完成的Cube的table,并可以对其进行查询.查询语句击中Cube后会返回存储在Hbase中的预计算结果。 + + + +恭喜,进行到这里你已经具备了使用Kylin的基本技能,可以去发现和探索更多更强大的功能了。 \ No newline at end of file diff --git a/website/_docs31/gettingstarted/quickstart.md b/website/_docs31/gettingstarted/quickstart.md new file mode 100644 index 0000000..6694d6a --- /dev/null +++ b/website/_docs31/gettingstarted/quickstart.md @@ -0,0 +1,303 @@ +--- +layout: docs-cn +title: Quick Start +categories: start +permalink: /docs31/gettingstarted/kylin-quickstart.html +since: v0.6.x +--- + +This guide aims to provide novice Kylin users with a complete process guide from download and installation to a sub-second query experience. The guide is divided into two parts, which respectively introduce the installation and use of Kylin in two scenarios – with an installation based on a Hadoop environment and installation from Docker image without a Hadoop environment. + +Users can follow these steps to get an initial understanding of how to use Kylin, master the basic skills of Kylin and then use Kylin to design models and speed up queries based on their own business scenarios. + +### 01 Install Kylin From a Docker Image + +In order to make it easy for users to try out Kylin, Zhu Weibin of Ant Financial has contributed “Kylin Docker Image” to the community. In this image, various services that Kylin depends on have been installed and deployed, including: + +- Jdk 1.8 +- Hadoop 2.7.0 +- Hive 1.2.1 +- Hbase 1.1.2 +- Spark 2.3.1 +- Zookeeper 3.4.6 +- Kafka 1.1.1 +- Mysql +- Maven 3.6.1 + +We have uploaded the user facing Kylin image to the Docker repository. Users do not need to build the image locally; they only need to install Docker to experience Kylin’s one-click installation. + +#### Step1 +First, execute the following command to pull the image from the Docker repository: + +``` +docker pull apachekylin/apache-kylin-standalone:3.0.1 +``` + +The image here contains the latest version of Kylin: Kylin v3.0.1. This image contains all of the big data components that Kylin depends on, so it takes a long time to pull the image – please be patient. After the pull is successful, it is displayed as follows: + + + +#### Step2 +Execute the following command to start the container: + +``` +docker run -d \ +-m 8G \ +-p 7070:7070 \ +-p 8088:8088 \ +-p 50070:50070 \ +-p 8032:8032 \ +-p 8042:8042 \ +-p 16010:16010 \ +apachekylin/apache-kylin-standalone:3.0.1 +``` + +The container will start shortly. Since the specified port in the container has been mapped to the local port, you can directly open the pages of each service in the local browser, such as: +- Kylin Page: http://127.0.0.1:7070/kylin/ +- HDFS NameNode Page: http://127.0.0.1:50070 +- Yarn ResourceManager Page: http://127.0.0.1:8088 +- HBase Page: http://127.0.0.1:60010 + +When the container starts, the following services are automatically started: +- NameNode, DataNode +- ResourceManager, NodeManager +- HBase +- Kafka +- Kylin + +It will also automatically run $ KYLIN_HOME / bin / sample.sh and create a kylin_streaming_topic in Kafka and continue to send data to that topic to allow users to experience building and querying cubes in batches and streams as soon as the container is launched. + +Users can enter the container through the docker exec command. The relevant environment variables in the container are as follows: +- JAVA_HOME = / home / admin / jdk1.8.0_141 +- HADOOP_HOME = / home / admin / hadoop-2.7.0 +- KAFKA_HOME = / home / admin / kafka_2.11-1.1.1 +- SPARK_HOME = / home / admin / spark-2.3.1-bin-hadoop2.6 +- HBASE_HOME = / home / admin / hbase-1.1.2 +- HIVE_HOME = / home / admin / apache-hive-1.2.1-bin +- KYLIN_HOME = / home / admin / apache-kylin-3.0.0-alpha2-bin-hbase1x + +After logging in to Kylin with user/password of ADMIN/KYLIN, users can use the sample cube to experience the construction and query of the cube, or they can create and query their own models and cubes by following the tutorial from Step 8 in “Install and Use Kylin Based on a Hadoop Environment” below. + +### 02 Install and Use Kylin Based on a Hadoop Environment + +Users who already have a stable Hadoop environment can download Kylin’s binary package and deploy it on their Hadoop cluster. Before installation, check the environment according to the following requirements. + +#### Environmental Inspection + +(1) Pre-Conditions: Kylin relies on a Hadoop cluster to process large data sets. You need to prepare a Hadoop cluster configured with HDFS, YARN, MapReduce, Hive, HBase, Zookeeper and other services for Kylin to run. + +Kylin can be started on any node of a Hadoop cluster. For your convenience, you can run Kylin on the master node, but for better stability, we recommend that you deploy Kylin on a clean Hadoop client node. The Hive, HBase, HDFS and other command lines have been installed on the node and the client configuration (such as core-site.xml, hive-site.xml, hbase-site.xml and others) have been properly configured and they can automatically synchronize with other nodes. + +The Linux account running Kylin must have access to the Hadoop cluster, including permissions to create/write HDFS folders, Hive tables, HBase tables and submit MapReduce tasks. + + + +(2) Hardware Requirements: The server running Kylin is recommended to have a minimum configuration of 4 core CPU, 16 GB memory and 100 GB disk. + + + +(3) Operating System Requirements: CentOS 6.5+ or Ubuntu 16.0.4+ + + + +(4) Software Requirements: Hadoop 2.7+, 3.0-3.1; Hive 0.13+, 1.2.1+; HBase 1.1+, 2.0 (supported since Kylin 2.5); JDK: 1.8+ + + + +It is recommended to use an integrated Hadoop environment for Kylin installation and testing, such as Hortonworks HDP or Cloudera CDH. Before Kylin was released, Hortonworks HDP 2.2-2.6 and 3.0, Cloudera CDH 5.7-5.11 and 6.0, AWS EMR 5.7-5.10, and Azure HDInsight 3.5-3.6 passed the test. + +#### Install and Use +When your environment meets the above prerequisites, you can install and start using Kylin. + +#### Step1. Download the Kylin Archive +Download a binary for your version of Hadoop from [Apache Kylin Download Site](https://kylin.apache.org/download/). Currently, the latest versions are Kylin 3.0.1 and Kylin 2.6.5, of which, version 3.0 supports the function of ingesting data in real time for pre-calculation. If your Hadoop environment is CDH 5.7, you can download Kylin 3.0.0 using the following command line: + +``` +cd /usr/local/ +wget http://apache.website-solution.net/kylin/apache-kylin-3.0.0/apache-kylin-3.0.0-bin-cdh57.tar.gz +``` + +#### Step2. Extract Kylin +Extract the downloaded Kylin archive and configure the environment variable KYLIN_HOME to point to the extracted directory: + +``` +tar -zxvf apache-kylin-3.0.0-bin-cdh57.tar.gz +cd apache-kylin-3.0.0-bin-cdh57 +export KYLIN_HOME=`pwd` +``` + +#### Step3. Download Spark +Since Kylin checks the Spark environment when it starts, you need to set SPARK_HOME: + +``` +export SPARK_HOME=/path/to/spark +``` + +If you don’t have a Spark environment already downloaded, you can also download Spark using Kylin’s own script: + +``` +$KYLIN_HOME/bin/download-spark.sh +``` + +The script will place the decompressed Spark in the $ KYLIN_HOME directory. If SPARK_HOME is not set in the system, the Spark in the $ KYLIN_HOME directory will be found automatically when Kylin is started. + +#### Step4. Environmental Inspection +Kylin runs on a Hadoop cluster and has certain requirements for the version, access permissions and CLASSPATH of each component. +To avoid encountering various environmental problems, you can run the $ KYLIN_HOME / bin / check-env.sh script to perform an environment check to see if there are any problems. +The script will print out detailed error messages if any errors are identified. If there is no error message, your environment is suitable for Kylin operation. + +#### Step5. Start Kylin +Run + +``` +$KYLIN_HOME/bin/kylin.sh +``` + +Start script to start Kylin. If the startup is successful, the following will be output at the end of the command line: + +``` +A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' +Check the log at /usr/local/apache-kylin-3.0.0-bin-cdh57/logs/kylin.log +Web UI is at http://<hostname>:7070/kylin +``` + +The default port started by Kylin is 7070. You can use $ KYLIN_HOME/bin/kylin-port-replace-util.sh set number to modify the port. The modified port is 7070 + number. + +#### Step6. Visit Kylin +After Kylin starts, you can access it through your browser: http://<hostname>:port/kylin – where <hostname> is the specific machine name, IP address or domain name, port is the Kylin port and the default is 7070. +The initial username and password are ADMIN/KYLIN. After the server starts, you can get the runtime log by looking at $ KYLIN_HOME/logs/kylin.log. + +#### Step7. Create Sample Cube +Kylin provides a script to create a sample cube for users to quickly experience Kylin. Run from the command line: + +``` +$KYLIN_HOME/bin/sample.sh +``` + +After completing, log in to Kylin, click System -> Configuration -> Reload Metadata to reload the metadata. + +After the metadata is reloaded, you can see a project named learn_kylin in Project in the upper left corner. +This contains kylin_sales_cube and kylin_streaming_cube, which are a batch cube and a streaming cube, respectively. +You can build the kylin_sales_cube directly and you can query it after the build is completed. +For kylin_streaming_cube, you need to set KAFKA_HOME and then execute $ {KYLIN_HOME} /bin/sample-streaming.sh. +This script will create a Kafka Topic named kylin_streaming_topic in the localhost: 9092 broker and it will also randomly send 100 messages to kylin_streaming_topic, then you can build kylin_streaming_cube. + +For sample cube, you can refer to:[Sample Cube](/docs/tutorial/kylin_sample.html) + +Of course, you can also try to create your own cube based on the following tutorial. + +#### Step8. Create Project +After logging in to Kylin, click the + in the upper left corner to create a Project. + + + +#### Step9. Load Hive Table +Click Model -> the Data Source -> the Load the From the Table Tree. +Kylin reads the Hive data source table and displays it in a tree. You can choose the tables you would like to add to models and then click Sync. The selected tables will then be loaded into Kylin. + + + +They then appear in the Tables directory of the data source. + +#### Step10. Create the Model +Click Model -> New -> New Model: + + + +Enter the Model Name and click Next, then select Fact Table and Lookup Table. You need to set the JOIN condition with the fact table when adding Lookup Table. + + + +Then click Next to select Dimension: + + + +Next, Select Measure: + + + +The next step is to set the time partition column and filter conditions. The time partition column is used to select the time range during incremental construction. If no time partition column is set, it means that the cubes under this model are all built. The filter condition is used for the where condition when flattening the table. + + + +Then, click Save to save the model. + +#### Step11. Create Cube + +Model -> New -> New Cube: + + + +Click Next to add Dimension. The dimensions of the Lookup Table can be set to Normal or Derived. The default setting is derived dimension. Derived dimension means that the column can be derived from the primary key of the dimension table. In fact, only the primary key column will be calculated by the cube. + + + +Click Next and click + Measure to add a pre-calculated measure. + +Kylin creates a Count (1) metric by default. Kylin supports eight metrics: SUM, MIN, MAX, COUNT, COUNT_DISTINCT, TOP_N, EXTENDED_COLUMN and PERCENTILE. + +Please select the appropriate return type for COUNT_DISTINCT and TOP_N, which is related to the size of the cube. +Click OK after the addition is complete and the measure will be displayed in the Measures list. + + + +After adding all of the measures, click Next to proceed. This page is about the settings for cube data refresh. +Here you can set the threshold for automatic merge (Auto Merge Thresholds), the minimum time for data retention (Retention Threshold) and the start time of the first segment. + + + +Click Next to continue going through the Advanced Settings. +Here you can set the aggregation group, RowKeys, Mandatory Cuboids, Cube Engine, etc. + +For more information about Advanced Settings, you can refer to Step 5 on the [create_cube](/docs/tutorial/create_cube.html), which details the settings for additional options. + +For more dimensional optimization, you can read: [aggregation-group](/blog/2016/02/18/new-aggregation-group/). + + + +If you are not familiar with Advanced Settings, you can keep the default settings first. Click Next to jump to the Kylin Properties page. Here you can override the cube-level Kylin configuration items and define the properties to be covered. +For configuration items, please refer to: [configuration](/docs/install/configuration.html). + + + +After the configuration is complete, click the Next button to the next page. +Here you can preview the basic information of the cube you are creating and you can return to the previous steps to modify it. +If you don’t need to make any changes, you can click the Save button to complete the cube creation. +After that, this cube will appear in your cube list. + + + +#### Step12. Build Cube + +The cube created in the previous step has definitions but no calculated data. Its status is “DISABLED” and it cannot be queried. If you want the cube to have data, you need to build it. Cubes are usually built in one of two ways: full builds or incremental builds. + +Click the Action under the Actions column of the cube to be expanded. Select Build. + +If the time partition column is not set in the model to which the cube belongs, the default is to build in full. + +Click Submit to submit the build task directly. If a time partition column is set, the following page will appear, where you will need to select the start and end time for building the data. + + + +After setting the start and end time, click Submit to submit the build task. +You can then observe the status of the build task on the Monitor page. +Kylin displays the running status of each step on the page, the output log and MapReduce tasks. +You can view more detailed log information in ${KYLIN_HOME}/logs/kylin.log. + + + +After the job is built, the status of the cube will change to READY and you can see the segment information. + + + +#### Step13. Query Cube +After the cube is built, you can see the table of the built cube and query it under the Tables list on the Insight page. +After the query hits the cube, it returns the pre-calculated results stored in HBase. + + + +Congratulations, you have already acquired the basic skills for using Kylin and you can now discover and explore more and more powerful functions. + + + + diff --git a/website/_docs31/howto/howto_use_mr_hive_dict.cn.md b/website/_docs31/howto/howto_use_mr_hive_dict.cn.md new file mode 100644 index 0000000..97b458e --- /dev/null +++ b/website/_docs31/howto/howto_use_mr_hive_dict.cn.md @@ -0,0 +1,65 @@ +--- +layout: docs31-cn +title: 使用Hive构建全局字典 +categories: howto +permalink: /cn/docs31/howto/howto_use_hive_mr_dict.html +--- + +## Global Dictionary in Hive + +### 背景介绍 +Count distinct(bitmap) 度量对于许多场景来说都非常重要, 比如统计点击量, kylin从1.5.3版本开始支持精确去重. +Apache Kylin 实现了基于bitmap的精确去重, 并且使用全局字典将字符串类型编码为整数类型。 +当前的全局字典是单线程构建的,对于高基列可能会占用大量的时间和内存。 +Kylin v3.0.0 引入了第一版的 Hive global dictionary(KYLIN-3841). 这个功能使用Hive的分布式SQL引擎来构建全局字典。 +为了进一步提升性能, kylin v3.1.0 引入了第二版的Hive global dictionary v2(KYLIN-4342), 这个版本在某些步骤使用MapReduce代替HQL进行全局字典的构建。 + +### 收益总结 +1.使用分布式的方式来构建全局字典,节省时间。 +2.Kylin集群中的Job Server可以做更少的工作, 因此会更加稳定。 +3.OneID, Hive Global Dictionary在kylin之外仍然具有可读性,因此每个人都可以在公司其他场景中重用这个字典。 + +## 如何使用 +如果你有一些count distinct(bitmap)的度量,并且该列的数据类型是string,你可以使用Hive Global Dictionary。 +比如,如果列名为PV_ID和USER_ID,表名为USER_ACTION,则可以在cube级别添加配置`kylin.dictionary.mr-hive.columns=USER_ACTION_PV_ID,USER_ACTION_USER_ID`以启用这个功能。 +请不要在integer类型的列使用Hive Global Dictionary,因为在falt table中这种列会被经过编码的整数类型替换,这样如果在同一列上有sum/max/min这些度量,聚合结果将会不正确。 +Hive Global Dictionary功能与shrunken global dictionary(KYLIN-3491)是冲突的因为它们解决的是同一个问题,但是解决方式不同。 + +### 相关配置项 + +- `kylin.dictionary.mr-hive.columns` is used to specific which columns need to use Hive-MR dict, should be *TABLE1_COLUMN1,TABLE2_COLUMN2*. Better configured in cube level, default value is empty. +- `kylin.dictionary.mr-hive.database` is used to specific which database Hive-MR dict table located, default value is *default*. +- `kylin.hive.union.style` Sometime sql which used to build global dict table may have problem in union syntax, you may refer to Hive Doc for more detail. The default value is *UNION*, using lower version of Hive should change to *UNION ALL*. +- `kylin.dictionary.mr-hive.table.suffix` is used to specific suffix of global dict table, default value is *_global_dict*. +- `kylin.dictionary.mr-hive.intermediate.table.suffix` is used to specific suffix for distinct value table, default value is *_group_by*. +- `kylin.dictionary.mr-hive.columns.reduce.num` A key/value structure(or a map), which key is {TABLE_NAME}_{COLUMN_NAME}, and value is number for expected reducers in Build Segment Level Dictionary (MR job Parallel Part Build). +- `kylin.dictionary.mr-hive.ref.columns` To reuse other global dictionary(s), you can specific a list here, to refer to some existent global dictionary(s) built by another cube. + +---- + +### Step by Step + +#### 添加精确去重度量 + + + +#### 在cube级别配置hive字典列 + + + +#### 构建新的segment + + + + + +关于这个功能的更多细节请参考 [Apache Kylin Wiki](https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary) + +### 参考链接 + +- https://issues.apache.org/jira/browse/KYLIN-3491 +- https://issues.apache.org/jira/browse/KYLIN-3841 +- https://issues.apache.org/jira/browse/KYLIN-3905 +- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union +- http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/ +- https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary \ No newline at end of file diff --git a/website/_docs31/howto/howto_use_mr_hive_dict.md b/website/_docs31/howto/howto_use_mr_hive_dict.md new file mode 100644 index 0000000..12ed322 --- /dev/null +++ b/website/_docs31/howto/howto_use_mr_hive_dict.md @@ -0,0 +1,67 @@ +--- +layout: docs31 +title: Use Hive to build global dictionary +categories: howto +permalink: /docs31/howto/howto_use_hive_mr_dict.html +--- + +## Global Dictionary in Hive + +### Background +Count distinct(bitmap) measure is very important for many scenario, such as PageView statistics, and Kylin support count distinct since 1.5.3 . +Apache Kylin implements precisely count distinct measure based on bitmap, and use global dictionary to encode string value into integer. +Currently we have to build global dictionary in single process/JVM, which may take a lot of time and memory for UHC. +Kylin v3.0.0 introduce Hive global dictionary v1(KYLIN-3841). By this feature, we use Hive, a distributed SQL engine to build global dictionary. +For improve performance, kylin v3.1.0 use MapReduce replace HQL in some steps, introduce Hive global dictionary v2(KYLIN-4342). + +### Benefit Summary +1.Build Global Dictionary in distributed way, thus building job spent less time. +2.Job Server will do less job, thus be more stable. +3.OneID, since the fact that Hive Global Dictionary is human-readable outside of Kylin, everyone can reuse this dictionary(Hive table) in the other scene across the company. + +## How to use + +If you have some count distinct(bitmap) measure, and data type of that column is String, you may need Hive Global Dictionary. Says columns name are PV_ID and USER_ID, and table name is USER_ACTION, you may add cube-level configuration `kylin.dictionary.mr-hive.columns=USER_ACTION_PV_ID,USER_ACTION_USER_ID` to enable this feature. + +Please don't use hive global dictionary on integer type column, you have to know that the value will be replaced with encoded integer in flat hive table. If you have sum/max/min measure on the same column, you will get wrong result in these measures. + +And you should know this feature is conflicted with shrunken global dictionary(KYLIN-3491) because they fix the same thing in different way. + +### Configuration + +- `kylin.dictionary.mr-hive.columns` is used to specific which columns need to use Hive-MR dict, should be *TABLE1_COLUMN1,TABLE2_COLUMN2*. Better configured in cube level, default value is empty. +- `kylin.dictionary.mr-hive.database` is used to specific which database Hive-MR dict table located, default value is *default*. +- `kylin.hive.union.style` Sometime sql which used to build global dict table may have problem in union syntax, you may refer to Hive Doc for more detail. The default value is *UNION*, using lower version of Hive should change to *UNION ALL*. +- `kylin.dictionary.mr-hive.table.suffix` is used to specific suffix of global dict table, default value is *_global_dict*. +- `kylin.dictionary.mr-hive.intermediate.table.suffix` is used to specific suffix for distinct value table, default value is *_group_by*. +- `kylin.dictionary.mr-hive.columns.reduce.num` A key/value structure(or a map), which key is {TABLE_NAME}_{COLUMN_NAME}, and value is number for expected reducers in Build Segment Level Dictionary (MR job Parallel Part Build). +- `kylin.dictionary.mr-hive.ref.columns` To reuse other global dictionary(s), you can specific a list here, to refer to some existent global dictionary(s) built by another cube. + +---- + +## Step + +#### Add count_distinct(bitmap) measure + + + +#### Set hive-dict-column in cube level config + + + +#### Build new segment + + + + + +More detail about this feature please refer [Apache Kylin Wiki](https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary) + +### Reference Link + +- https://issues.apache.org/jira/browse/KYLIN-3491 +- https://issues.apache.org/jira/browse/KYLIN-3841 +- https://issues.apache.org/jira/browse/KYLIN-3905 +- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union +- http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/ +- https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary \ No newline at end of file diff --git a/website/_docs31/howto/howto_use_restapi.cn.md b/website/_docs31/howto/howto_use_restapi.cn.md index 6ad968c..f5e0ac4 100644 --- a/website/_docs31/howto/howto_use_restapi.cn.md +++ b/website/_docs31/howto/howto_use_restapi.cn.md @@ -7,13 +7,20 @@ since: v0.7.1 --- This page lists the major RESTful APIs provided by Kylin. - -* Query +* Authentication * [Authentication](#authentication) +* Query * [Query](#query) + * [Prepare query](#prepare-query) + * [Save query](#save-query) + * [Remove saved query](#remove-saved-query) + * [Get saved queries](#get-saved-queries) + * [Get running queries](#get-running-queries) + * [Stop query](#stop-query) * [List queryable tables](#list-queryable-tables) * CUBE * [Create cube](#create-cube) + * [Update cube](#update-cube) * [List cubes](#list-cubes) * [Get cube](#get-cube) * [Get cube descriptor (dimension, measure info, etc)](#get-cube-descriptor) @@ -23,10 +30,21 @@ This page lists the major RESTful APIs provided by Kylin. * [Disable cube](#disable-cube) * [Purge cube](#purge-cube) * [Delete segment](#delete-segment) + * [Auto-Merge segment](#auto-merge-segment) + * [Get sql of a cube](#get-sql-of-a-cube) + * [Get sql of a cube segment](#get-sql-of-a-cube-segment) + * [Force rebuild lookup table snapshot](#force-rebuild-lookup-table-snapshot) + * [Clone cube](#clone-cube) + * [Delete Cube](#delete-cube) + * [Get hbase info](#get-hbase-info) + * [Get current cuboid](#get-current-cuboid) + * [Migrate cube](#migrate-cube) * MODEL * [Create model](#create-model) + * [Update model](#update-model) * [Get modelDescData](#get-modeldescdata) * [Delete model](#delete-model) + * [Clone model](#clone-model) * JOB * [Resume job](#resume-job) * [Pause job](#pause-job) @@ -35,25 +53,34 @@ This page lists the major RESTful APIs provided by Kylin. * [Get job status](#get-job-status) * [Get job step output](#get-job-step-output) * [Get job list](#get-job-list) + * [Get job status overview](#get-job-status-overview) + * [Resubmit realtime build job](#resubmit-realtime-build-job) + * [Rollback job](#rollback-job) * Metadata * [Get Hive Table](#get-hive-table) * [Get Hive Tables](#get-hive-tables) * [Load Hive Tables](#load-hive-tables) + * [Unload Hive Tables](#unload-hive-tables) + * [Show databases in hive](#show-databases-in-hive) + * [Show tables in a hive database](#show-tables-in-a-hive-database) * Cache * [Wipe cache](#wipe-cache) + * [Announce wipe cache](#announce-wipe-cache) + * [Hot load kylin config](#hot-load-kylin-config) * Streaming * [Initiate cube start position](#initiate-cube-start-position) * [Build stream cube](#build-stream-cube) * [Check segment holes](#check-segment-holes) * [Fill segment holes](#fill-segment-holes) + * [Get streaming configs](#get-streaming-configs) + * [Get Kafka configs](#get-kafka-configs) + * [Create streaming schema](#create-streaming-schema) + * [Update streaming tables](#update-streaming-schema) * ACL * [Get users can query the table](#get-users-can-query-the-table) * [Get users cannot query the table](#get-users-cannot-query-the-table) * [Put user to table blacklist](#put-user-to-table-blacklist) * [Delete user from table blacklist](#delete-user-from-table-blacklist) -* Metrics - * [Get all metrics](#get-all-metrics) - * [Get specific type of metrics](#get-specific-type-of-metrics) ## Authentication `POST /kylin/api/user/authentication` @@ -72,23 +99,33 @@ python -c "import base64; print base64.standard_b64encode('$UserName:$Password') #### Response Sample ```sh -{ - "userDetails":{ - "password":null, - "username":"sample", - "authorities":[ - { - "authority":"ROLE_ANALYST" - }, - { - "authority":"ROLE_MODELER" - } - ], - "accountNonExpired":true, - "accountNonLocked":true, - "credentialsNonExpired":true, - "enabled":true - } +{ + "userDetails": { + "username": "sample", + "password": "null", + "authorities": [ + { + "authority": "ROLE_ADMIN" + }, + { + "authority": "ROLE_ANALYST" + }, + { + "authority": "ROLE_MODELER" + }, + { + "authority": "ALL_USERS" + } + ], + "disabled": false, + "defaultPassword": false, + "locked": false, + "lockedTime": 0, + "wrongTime": 0, + "uuid": "3704ba8c-deb1-ac47-729d-c1039c1bd6ec", + "last_modified": 1585219480112, + "version": "3.0.0.20500" + } } ``` @@ -122,7 +159,6 @@ curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" * limit - `optional` `int` Query limit. If limit is set in sql, perPage will be ignored. * acceptPartial - `optional` `bool` Whether accept a partial result or not, default be "false". Set to "false" for production use. * project - `optional` `string` Project to perform query. Default value is 'DEFAULT'. -* backdoorToggles - `optional` `map` You can set a key/value pair (`"DEBUG_TOGGLE_HIT_CUBE":"SimpleCube_01"`) to specific cube for your query. Default is empty map. #### Request Sample @@ -236,10 +272,86 @@ curl -X POST -H "Authorization: Basic XXXXXXXXX" -H "Content-Type: application/j ``` +## Prepare query +`POST /kylin/api/query/prestate` + +#### Request Body +* sql - `required` `string` The text of sql statement. +* offset - `optional` `int` Query offset. If offset is set in sql, curIndex will be ignored. +* limit - `optional` `int` Query limit. If limit is set in sql, perPage will be ignored. +* acceptPartial - `optional` `bool` Whether accept a partial result or not, default be "false". Set to "false" for production use. +* project - `optional` `string` Project to perform query. Default value is 'DEFAULT'. + +#### Request Sample + +```sh +{ + "sql":"select * from TEST_KYLIN_FACT", + "offset":0, + "limit":50000, + "acceptPartial":false, + "project":"DEFAULT" +} +``` + + +## Save query +`POST /kylin/api/saved_queries` + +#### Request Body +* sql - `required` `string` The text of sql statement. +* name - `required` `string` Sql name. +* project - `required` `string` Project to perform query. +* description - `optional` `string` Sql description. + +#### Request Sample + +```sh +{ + "sql": "select count(*) from kylin_sales", + "name": "test", + "project": "learn_kylin" +} +``` + + +## Remove saved query +`DELETE /kylin/api/saved_queries/{id}` + +#### Request Parameters +* id - `required` `string` The id of saved query you want to remove + + +## Get saved queries +`GET /kylin/api/saved_queries` + +#### Response Sample +``` +[ + { + "name": "test", + "project": "learn_kylin", + "sql": "select count(*) from kylin_sales", + "description": null, + "id": "-1674470999" + } +] +``` + +## Get running queries +`GET /kylin/api/query/runningQueries` + +## Stop Query +`PUT /kylin/api/query/{queryId}/stop` + +#### Path Variable +* queryId - `required` `String` The queryId of you want to stop. You can obtain it by `Get running queries`. + + ## List queryable tables `GET /kylin/api/tables_and_columns` -#### Request Parameters +#### Path Variable * project - `required` `string` The project to load tables #### Response Sample @@ -346,6 +458,19 @@ curl -X POST -H "Authorization: Basic XXXXXXXXX" -H "Content-Type: application/j } ``` +## Update Cube +`PUT /kylin/api/cubes` + +#### Request Body +(Same as "Create Cube") + +#### Request Sample +(Same as "Create Cube") + +#### Response Sample +(Same as "Create Cube") + + ## List cubes `GET /kylin/api/cubes` @@ -394,252 +519,438 @@ Get descriptor for specified cube instance. ```sh [ { - "uuid": "a24ca905-1fc6-4f67-985c-38fa5aeafd92", - "name": "test_kylin_cube_with_slr_desc", - "description": null, + "uuid": "0ef9b7a8-3929-4dff-b59d-2100aadc8dbf", + "last_modified": 1574402902000, + "version": "3.0.0.20500", + "name": "kylin_sales_cube", + "is_draft": false, + "model_name": "kylin_sales_model", + "description": "", + "null_string": null, "dimensions": [ { - "id": 0, - "name": "CAL_DT", - "table": "EDW.TEST_CAL_DT", - "column": null, - "derived": [ - "WEEK_BEG_DT" - ], - "hierarchy": false - }, + "name": "TRANS_ID", + "table": "KYLIN_SALES", + "column": "TRANS_ID", + "derived": null + }, { - "id": 1, - "name": "CATEGORY", - "table": "DEFAULT.TEST_CATEGORY_GROUPINGS", - "column": null, + "name": "YEAR_BEG_DT", + "table": "KYLIN_CAL_DT", + "column": null, "derived": [ - "USER_DEFINED_FIELD1", - "USER_DEFINED_FIELD3", - "UPD_DATE", - "UPD_USER" - ], - "hierarchy": false - }, + "YEAR_BEG_DT" + ] + }, { - "id": 2, - "name": "CATEGORY_HIERARCHY", - "table": "DEFAULT.TEST_CATEGORY_GROUPINGS", - "column": [ - "META_CATEG_NAME", - "CATEG_LVL2_NAME", - "CATEG_LVL3_NAME" - ], - "derived": null, - "hierarchy": true - }, + "name": "MONTH_BEG_DT", + "table": "KYLIN_CAL_DT", + "column": null, + "derived": [ + "MONTH_BEG_DT" + ] + }, { - "id": 3, - "name": "LSTG_FORMAT_NAME", - "table": "DEFAULT.TEST_KYLIN_FACT", - "column": [ - "LSTG_FORMAT_NAME" - ], - "derived": null, - "hierarchy": false - }, + "name": "WEEK_BEG_DT", + "table": "KYLIN_CAL_DT", + "column": null, + "derived": [ + "WEEK_BEG_DT" + ] + }, { - "id": 4, - "name": "SITE_ID", - "table": "EDW.TEST_SITES", - "column": null, + "name": "USER_DEFINED_FIELD1", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": null, "derived": [ - "SITE_NAME", - "CRE_USER" - ], - "hierarchy": false - }, + "USER_DEFINED_FIELD1" + ] + }, { - "id": 5, - "name": "SELLER_TYPE_CD", - "table": "EDW.TEST_SELLER_TYPE_DIM", - "column": null, + "name": "USER_DEFINED_FIELD3", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": null, "derived": [ - "SELLER_TYPE_DESC" - ], - "hierarchy": false - }, + "USER_DEFINED_FIELD3" + ] + }, { - "id": 6, - "name": "SELLER_ID", - "table": "DEFAULT.TEST_KYLIN_FACT", - "column": [ - "SELLER_ID" - ], - "derived": null, - "hierarchy": false + "name": "META_CATEG_NAME", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": "META_CATEG_NAME", + "derived": null + }, + { + "name": "CATEG_LVL2_NAME", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": "CATEG_LVL2_NAME", + "derived": null + }, + { + "name": "CATEG_LVL3_NAME", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": "CATEG_LVL3_NAME", + "derived": null + }, + { + "name": "LSTG_FORMAT_NAME", + "table": "KYLIN_SALES", + "column": "LSTG_FORMAT_NAME", + "derived": null + }, + { + "name": "SELLER_ID", + "table": "KYLIN_SALES", + "column": "SELLER_ID", + "derived": null + }, + { + "name": "BUYER_ID", + "table": "KYLIN_SALES", + "column": "BUYER_ID", + "derived": null + }, + { + "name": "ACCOUNT_BUYER_LEVEL", + "table": "BUYER_ACCOUNT", + "column": "ACCOUNT_BUYER_LEVEL", + "derived": null + }, + { + "name": "ACCOUNT_SELLER_LEVEL", + "table": "SELLER_ACCOUNT", + "column": "ACCOUNT_SELLER_LEVEL", + "derived": null + }, + { + "name": "BUYER_COUNTRY", + "table": "BUYER_ACCOUNT", + "column": "ACCOUNT_COUNTRY", + "derived": null + }, + { + "name": "SELLER_COUNTRY", + "table": "SELLER_ACCOUNT", + "column": "ACCOUNT_COUNTRY", + "derived": null + }, + { + "name": "BUYER_COUNTRY_NAME", + "table": "BUYER_COUNTRY", + "column": "NAME", + "derived": null + }, + { + "name": "SELLER_COUNTRY_NAME", + "table": "SELLER_COUNTRY", + "column": "NAME", + "derived": null + }, + { + "name": "OPS_USER_ID", + "table": "KYLIN_SALES", + "column": "OPS_USER_ID", + "derived": null + }, + { + "name": "OPS_REGION", + "table": "KYLIN_SALES", + "column": "OPS_REGION", + "derived": null } - ], + ], "measures": [ { - "id": 1, - "name": "GMV_SUM", + "name": "GMV_SUM", "function": { - "expression": "SUM", + "expression": "SUM", "parameter": { - "type": "column", - "value": "PRICE", - "next_parameter": null - }, + "type": "column", + "value": "KYLIN_SALES.PRICE" + }, "returntype": "decimal(19,4)" - }, - "dependent_measure_ref": null - }, + } + }, { - "id": 2, - "name": "GMV_MIN", + "name": "BUYER_LEVEL_SUM", "function": { - "expression": "MIN", + "expression": "SUM", "parameter": { - "type": "column", - "value": "PRICE", - "next_parameter": null - }, - "returntype": "decimal(19,4)" - }, - "dependent_measure_ref": null - }, + "type": "column", + "value": "BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL" + }, + "returntype": "bigint" + } + }, { - "id": 3, - "name": "GMV_MAX", + "name": "SELLER_LEVEL_SUM", "function": { - "expression": "MAX", + "expression": "SUM", "parameter": { - "type": "column", - "value": "PRICE", - "next_parameter": null - }, - "returntype": "decimal(19,4)" - }, - "dependent_measure_ref": null - }, + "type": "column", + "value": "SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL" + }, + "returntype": "bigint" + } + }, { - "id": 4, - "name": "TRANS_CNT", + "name": "TRANS_CNT", "function": { - "expression": "COUNT", + "expression": "COUNT", "parameter": { - "type": "constant", - "value": "1", - "next_parameter": null - }, + "type": "constant", + "value": "1" + }, "returntype": "bigint" - }, - "dependent_measure_ref": null - }, + } + }, { - "id": 5, - "name": "ITEM_COUNT_SUM", + "name": "SELLER_CNT_HLL", "function": { - "expression": "SUM", + "expression": "COUNT_DISTINCT", "parameter": { - "type": "column", - "value": "ITEM_COUNT", - "next_parameter": null - }, - "returntype": "bigint" - }, - "dependent_measure_ref": null + "type": "column", + "value": "KYLIN_SALES.SELLER_ID" + }, + "returntype": "hllc(10)" + } + }, + { + "name": "TOP_SELLER", + "function": { + "expression": "TOP_N", + "parameter": { + "type": "column", + "value": "KYLIN_SALES.PRICE", + "next_parameter": { + "type": "column", + "value": "KYLIN_SALES.SELLER_ID" + } + }, + "returntype": "topn(100)", + "configuration": { + "topn.encoding.KYLIN_SALES.SELLER_ID": "dict", + "topn.encoding_version.KYLIN_SALES.SELLER_ID": "1" + } + } } - ], + ], "rowkey": { "rowkey_columns": [ { - "column": "SELLER_ID", - "length": 18, - "dictionary": null, - "mandatory": true - }, + "column": "KYLIN_SALES.BUYER_ID", + "encoding": "integer:4", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.SELLER_ID", + "encoding": "integer:4", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.TRANS_ID", + "encoding": "integer:4", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.PART_DT", + "encoding": "date", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "CAL_DT", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "KYLIN_SALES.LEAF_CATEG_ID", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "LEAF_CATEG_ID", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "META_CATEG_NAME", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "CATEG_LVL2_NAME", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL3_NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "CATEG_LVL3_NAME", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "LSTG_FORMAT_NAME", - "length": 12, - "dictionary": null, - "mandatory": false - }, + "column": "SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "LSTG_SITE_ID", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "BUYER_ACCOUNT.ACCOUNT_COUNTRY", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "SLR_SEGMENT_CD", - "length": 0, - "dictionary": "true", - "mandatory": false + "column": "SELLER_ACCOUNT.ACCOUNT_COUNTRY", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "BUYER_COUNTRY.NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "SELLER_COUNTRY.NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.LSTG_FORMAT_NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.LSTG_SITE_ID", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.OPS_USER_ID", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.OPS_REGION", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false } - ], - "aggregation_groups": [ - [ - "LEAF_CATEG_ID", - "META_CATEG_NAME", - "CATEG_LVL2_NAME", - "CATEG_LVL3_NAME", - "CAL_DT" - ] ] - }, - "signature": "lsLAl2jL62ZApmOLZqWU3g==", - "last_modified": 1445850327000, - "model_name": "test_kylin_with_slr_model_desc", - "null_string": null, + }, "hbase_mapping": { "column_family": [ { - "name": "F1", + "name": "F1", "columns": [ { - "qualifier": "M", + "qualifier": "M", "measure_refs": [ - "GMV_SUM", - "GMV_MIN", - "GMV_MAX", - "TRANS_CNT", - "ITEM_COUNT_SUM" + "GMV_SUM", + "BUYER_LEVEL_SUM", + "SELLER_LEVEL_SUM", + "TRANS_CNT" + ] + } + ] + }, + { + "name": "F2", + "columns": [ + { + "qualifier": "M", + "measure_refs": [ + "SELLER_CNT_HLL", + "TOP_SELLER" ] } ] } ] - }, - "notify_list": null, - "auto_merge_time_ranges": null, - "retention_range": 0 + }, + "aggregation_groups": [ + { + "includes": [ + "KYLIN_SALES.PART_DT", + "KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME", + "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME", + "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL3_NAME", + "KYLIN_SALES.LEAF_CATEG_ID", + "KYLIN_SALES.LSTG_FORMAT_NAME", + "KYLIN_SALES.LSTG_SITE_ID", + "KYLIN_SALES.OPS_USER_ID", + "KYLIN_SALES.OPS_REGION", + "BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL", + "SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL", + "BUYER_ACCOUNT.ACCOUNT_COUNTRY", + "SELLER_ACCOUNT.ACCOUNT_COUNTRY", + "BUYER_COUNTRY.NAME", + "SELLER_COUNTRY.NAME" + ], + "select_rule": { + "hierarchy_dims": [ + [ + "KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME", + "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME", + "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL3_NAME", + "KYLIN_SALES.LEAF_CATEG_ID" + ] + ], + "mandatory_dims": [ + "KYLIN_SALES.PART_DT" + ], + "joint_dims": [ + [ + "BUYER_ACCOUNT.ACCOUNT_COUNTRY", + "BUYER_COUNTRY.NAME" + ], + [ + "SELLER_ACCOUNT.ACCOUNT_COUNTRY", + "SELLER_COUNTRY.NAME" + ], + [ + "BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL", + "SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL" + ], + [ + "KYLIN_SALES.LSTG_FORMAT_NAME", + "KYLIN_SALES.LSTG_SITE_ID" + ], + [ + "KYLIN_SALES.OPS_USER_ID", + "KYLIN_SALES.OPS_REGION" + ] + ] + } + } + ], + "signature": null, + "notify_list": [], + "status_need_notify": [], + "partition_date_start": 1325376000000, + "partition_date_end": 3153600000000, + "auto_merge_time_ranges": [], + "volatile_range": 0, + "retention_range": 0, + "engine_type": 2, + "storage_type": 2, + "override_kylin_properties": { + "kylin.cube.aggrgroup.is-mandatory-only-valid": "true", + "kylin.engine.spark.rdd-partition-cut-mb": "500" + }, + "cuboid_black_list": [], + "parent_forward": 3, + "mandatory_dimension_set_list": [], + "snapshot_table_desc_list": [] } ] ``` @@ -851,6 +1162,83 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js ## Delete Segment `DELETE /kylin/api/cubes/{cubeName}/segs/{segmentName}` + +## Auto-merge Segment +`PUT /kylin/api/cubes/{cubeName}/automerge` + + +## Get sql of a cube +`GET /kylin/api/cubes/{cubeName}/sql` + +#### Path variable +* cubeName - `required` `string` Cube name. + +#### Response Sample +```sh +{ + "sql": "SELECT\n`KYLIN_SALES`.`TRANS_ID` as `KYLIN_SALES_TRANS_ID`\n,`KYLIN_SALES`.`PART_DT` as `KYLIN_SALES_PART_DT`\n,`KYLIN_CAL_DT`.`YEAR_BEG_DT` as `KYLIN_CAL_DT_YEAR_BEG_DT`\n,`KYLIN_CAL_DT`.`MONTH_BEG_DT` as `KYLIN_CAL_DT_MONTH_BEG_DT`\n,`KYLIN_CAL_DT`.`WEEK_BEG_DT` as `KYLIN_CAL_DT_WEEK_BEG_DT`\n,`KYLIN_SALES`.`LEAF_CATEG_ID` as `KYLIN_SALES_LEAF_CATEG_ID`\n,`KYLIN_SALES`.`LSTG_SITE_ID` as `KYLIN_SALES_LSTG_SITE_ID`\n,`KYLIN_CATEGORY_GROUPINGS`.`USER_DEFINED_FIELD1` as `KYLIN_ [...] +} +``` + + +## Get sql of a cube segment +`GET /kylin/api/cubes/{cubeName}/segs/{segmentName}/sql` + +#### Path variable +* cubeName - `required` `string` Cube name. +* segmentName - `required` `string` Segment name. + +#### Response Sample +```sh +{ + "sql": "SELECT\n`KYLIN_SALES`.`TRANS_ID` as `KYLIN_SALES_TRANS_ID`\n,`KYLIN_SALES`.`PART_DT` as `KYLIN_SALES_PART_DT`\n,`KYLIN_CAL_DT`.`YEAR_BEG_DT` as `KYLIN_CAL_DT_YEAR_BEG_DT`\n,`KYLIN_CAL_DT`.`MONTH_BEG_DT` as `KYLIN_CAL_DT_MONTH_BEG_DT`\n,`KYLIN_CAL_DT`.`WEEK_BEG_DT` as `KYLIN_CAL_DT_WEEK_BEG_DT`\n,`KYLIN_SALES`.`LEAF_CATEG_ID` as `KYLIN_SALES_LEAF_CATEG_ID`\n,`KYLIN_SALES`.`LSTG_SITE_ID` as `KYLIN_SALES_LSTG_SITE_ID`\n,`KYLIN_CATEGORY_GROUPINGS`.`USER_DEFINED_FIELD1` as `KYLIN_ [...] +} +``` + + +## Force rebuild lookup table snapshot +`PUT /kylin/api/cubes/{cubeName}/refresh_lookup` + + +## Clone cube +`PUT /kylin/api/cubes/{cubeName}/clone` + + +## Delete cube +`DELETE /kylin/api/cubes/{cubeName}` + + +## Get hbase info +`GET /kylin/api/cubes/{cubeName}/hbase` + +#### Response Sample +```sh +[ + { + "segmentName": "20120101000000_20120103000000", + "segmentUUID": null, + "segmentStatus": "READY", + "tableName": "KYLIN_E1VT22737D", + "tableSize": 0, + "regionCount": 1, + "dateRangeStart": 1325376000000, + "dateRangeEnd": 1325548800000, + "sourceOffsetStart": 0, + "sourceOffsetEnd": 0, + "sourceCount": 29 + } +] +``` + + +## Get current cuboid +`GET /kylin/api/cubes/{cubeName}/cuboids/current` + + +## Migrate Cube +`POST /kylin/api/cubes/{cube}/{project}/migrate` + + *** ## Create Model @@ -862,7 +1250,7 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js * projectName - `required` `string` projectName to which model belongs #### Request Sample -``` +```sh { "modelDescData": "{\"uuid\": \"0928468a-9fab-4185-9a14-6f2e7c74823f\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_model\",\"owner\": null,\"is_draft\": false,\"description\": \"\",\"fact_table\": \"DEFAULT.KYLIN_SALES\",\"lookups\": [{\"table\": \"DEFAULT.KYLIN_CAL_DT\",\"kind\": \"LOOKUP\",\"alias\": \"KYLIN_CAL_DT\",\"join\": {\"type\": \"inner\",\"primary_key\": [\"KYLIN_CAL_DT.CAL_DT\"],\"foreign_key\": [\"KYLIN_SALES.PART_DT\"]}},{\"table\": \"DEFAULT.KY [...] "modelName": "kylin_test_model", @@ -884,6 +1272,20 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js } ``` + +## Update Model +`PUT /kylin/api/models` + +#### Request Body +(Same as "Create Model") + +#### Request Sample +(Same as "Create Model") + +#### Response Sample +(Same as "Create Model") + + ## Get ModelDescData `GET /kylin/api/models` @@ -1089,6 +1491,9 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js #### Path variable * modelName - `required` `string` Model name. +## Clone Model +`PUT /kylin/api/models/{modelName}/clone` + *** ## Resume Job @@ -1269,6 +1674,81 @@ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=lea } ] ``` + +## Get Job Status Overview +`GET /kylin/api/jobs/overview` + +### Request Variables +(Same as "Get job list") + +#### Response Sample +```sh +{ + "DISCARDED": 0, + "NEW": 0, + "STOPPED": 0, + "PENDING": 0, + "RUNNING": 0, + "FINISHED": 1, + "ERROR": 0 +} +``` + +## Resubmit realtime build job +`PUT /kylin/api/jobs/{jobId}/resubmit` + +## Rollback job +`PUT /kylin/api/{jobId}/steps/{stepId}/rollback` + +#### Path Parameters +* jobId - `required` `string` job id you want to rollback +* stepId - `required` `string` specify rollback step id, e.g.(Create Intermediate Flat Hive:1) + +For example, rollback job to Create Intermediate Flat Hive: +`PUT: kylin/api/jobs/4e84cb5e-a929-89c7-6240-768fa9835d89/steps/1/rollback` + +#### Response Sample +```sh +{ + "uuid": "4e84cb5e-a929-89c7-6240-768fa9835d89", + "last_modified": 1590054128311, + "version": "3.0.0.20500", + "name": "BUILD CUBE - kylin_sales_cube - 20120102000000_20120103000000 - CST 2020-05-21 17:38:59", + "projectName": "learn_kylin", + "type": "BUILD", + "duration": 187, + "related_cube": "kylin_sales_cube", + "display_cube_name": "kylin_sales_cube", + "related_segment": "fca98f62-cb3f-8b53-5bf1-94a85334560b", + "exec_start_time": 1590053963522, + "exec_end_time": 0, + "exec_interrupt_time": 0, + "mr_waiting": 39, + "steps": [ + { + "interruptCmd": null, + "id": "4e84cb5e-a929-89c7-6240-768fa9835d89-00", + "name": "Create Intermediate Flat Hive Table", + "sequence_id": 0, + "exec_cmd": null, + "interrupt_cmd": null, + "exec_start_time": 0, + "exec_end_time": 0, + "exec_wait_time": 0, + "step_status": "PENDING", + "cmd_type": "SHELL_CMD_HADOOP", + "info": {}, + "run_async": false + }, + ... + ], + "submitter": "ADMIN", + "job_status": "RUNNING", + "build_instance": "20984@host", + "progress": 0.0 +} +``` + *** ## Get Hive Table @@ -1376,6 +1856,29 @@ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=lea "result.unloaded": ["sapmle_08"] } ``` + +## Unload Hive Tables +`DELETE /kylin/api/tables/{tables}/{project}` + +#### Path Parameters +* tables - `required` `string` table names you want to unload, separated with comma. +* project - `required` `String` the project which the tables belong to. + +#### Response Sample +```sh +{ + "result.unload.success": [ + "kylin_sales" + ], + "result.unload.fail": [] +} +``` + +## Show databases in hive +`GET /kylin/api/tables/hive` + +## Show tables in a hive database +`GET /kylin/api/tables/hive/{database}` *** @@ -1387,6 +1890,17 @@ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=lea * name - `required` `string` Cache key, e.g the cube name. * action - `required` `string` 'create', 'update' or 'drop' + +## Announce wipe cache +`PUT /kylin/api/cache/announce/{type}/{name}/{action}` + +#### Path variable +(Same as "Wipe cache") + +## Hot load kylin config +`POST /kylin/api/cache/announce/config` + + *** ## Initiate cube start position @@ -1459,6 +1973,102 @@ This API is specific for stream cube's building; #### Path variable * cubeName - `required` `string` Cube name +## Get streaming configs +`GET /kylin/api/streaming/getConfig` + +#### Response sample +```sh +[ + { + "uuid": "8b2b9dfe-777c-4d39-bf89-8472ec929193", + "last_modified": 1587528491000, + "version": "3.0.0.20500", + "name": "DEFAULT.KYLIN_STREAMING_TABLE", + "type": "kafka" + } +] +``` + +## Get kafka configs +`GET /kylin/api/streaming/getKfkConfig` + +#### Response sample +```sh +[ + { + "uuid": "8b2b9dfe-777c-4d39-bf89-8472ec919193", + "last_modified": 1587528491000, + "version": "3.0.0.20500", + "name": "DEFAULT.KYLIN_STREAMING_TABLE", + "clusters": [ + { + "uuid": null, + "last_modified": 0, + "version": "3.0.0.20500", + "brokers": [ + { + "id": 0, + "host": "localhost", + "port": 9092 + } + ] + } + ], + "topic": "kylin_streaming_topic", + "timeout": 60000, + "parserName": "org.apache.kylin.source.kafka.TimedJsonStreamParser", + "timestampField": "order_time", + "margin": 0, + "splitRows": 1000000, + "parserProperties": null + } +] +``` + +## Create streaming schema +`POST /kylin/api/streaming` + +#### Request body +* project - `required` `string` Project which you want create streaming schema to. +* tableData - `required` `string` Streaming table desc. +* streamingConfig - `required` `string` Streaming config. +* kafkaConfig - `required` `string` Kafka config. + +#### Request sample +``` +{ + "project":"test", + "tableData":"{\"uuid\": \"e286e39e-41d7-44c2-8fa2-41b365123987\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"KYLIN_TEST_STREAMING_TABLE\",\"columns\": [{\"id\": \"1\",\"name\": \"AMOUNT\",\"datatype\": \"decimal(19,4)\"},{\"id\": 3,\"name\": \"ORDER_TIME\",\"datatype\": \"timestamp\",\"index\": \"T\"}],\"source_type\": 1,\"table_type\": null,\"database\": \"DEFAULT\"}", + "streamingConfig":"{\"uuid\": \"8b2b9dfe-777c-4d39-bf89-8472ec929193\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"DEFAULT.KYLIN_TEST_STREAMING_TABLE\",\"type\": \"kafka\"}", + "kafkaConfig":"{\"uuid\": \"8b2b9dfe-777c-4d39-bf89-8472ec919193\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"DEFAULT.KYLIN_STREAMING_TABLE\",\"clusters\": [{\"uuid\": null,\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"brokers\": [{\"id\": 0,\"host\": \"localhost\",\"port\": 9092}]}],\"topic\": \"kylin_streaming_topic\",\"timeout\": 60000,\"parserName\": \"org.apache.kylin.source.kafka.TimedJsonStreamParser\",\"timestampField\": \"order_time\",\"margin\": 0,\ [...] +} +``` + +#### Response sample +```sh +{ + "project": "test", + "tableData": "{\"uuid\": \"e286e39e-41d7-44c2-8fa2-41b365123987\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"KYLIN_TEST_STREAMING_TABLE\",\"columns\": [{\"id\": \"1\",\"name\": \"AMOUNT\",\"datatype\": \"decimal(19,4)\"},{\"id\": 3,\"name\": \"ORDER_TIME\",\"datatype\": \"timestamp\",\"index\": \"T\"}],\"source_type\": 1,\"table_type\": null,\"database\": \"DEFAULT\"}", + "streamingConfig": "{\"uuid\": \"8b2b9dfe-777c-4d39-bf89-8472ec929193\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"DEFAULT.KYLIN_TEST_STREAMING_TABLE\",\"type\": \"kafka\"}", + "kafkaConfig": "{\"uuid\": \"8b2b9dfe-777c-4d39-bf89-8472ec919193\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"DEFAULT.KYLIN_STREAMING_TABLE\",\"clusters\": [{\"uuid\": null,\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"brokers\": [{\"id\": 0,\"host\": \"localhost\",\"port\": 9092}]}],\"topic\": \"kylin_streaming_topic\",\"timeout\": 60000,\"parserName\": \"org.apache.kylin.source.kafka.TimedJsonStreamParser\",\"timestampField\": \"order_time\",\"margin\": 0, [...] + "successful": true, + "message": null +} +``` + +## Update streaming schema +`PUT /kylin/api/streaming` + +#### Request body +(Same as "Create streaming schema) + +#### Request sample +(Same as "Create streaming schema) + +#### Response sample +(Same as "Create streaming schema) + + *** ## Get users can query the table @@ -1495,28 +2105,6 @@ This API is specific for stream cube's building; * table - `required` `string` table name * name - `required` `string` user name or group name you want to delete from table blacklist -*** - -## Get all metrics -`GET /kylin/api/jmetrics/` - -#### Response sample -``` -{ - "version": "3.0.0", - "gauges": {}, - "counters": {}, - "histograms": {}, - "meters": {}, - "timers": {} -} -``` - -## Get specific type of metrics -`GET /kylin/api/jmetrics/{type}` - -#### Path variable -* type - `required` `string` Specific type of metrics you want, e.g meters ## Use RESTful API in Javascript diff --git a/website/_docs31/howto/howto_use_restapi.md b/website/_docs31/howto/howto_use_restapi.md index 32c6b71..428caa6 100644 --- a/website/_docs31/howto/howto_use_restapi.md +++ b/website/_docs31/howto/howto_use_restapi.md @@ -7,13 +7,20 @@ since: v0.7.1 --- This page lists the major RESTful APIs provided by Kylin. - -* Query +* Authentication * [Authentication](#authentication) +* Query * [Query](#query) + * [Prepare query](#prepare-query) + * [Save query](#save-query) + * [Remove saved query](#remove-saved-query) + * [Get saved queries](#get-saved-queries) + * [Get running queries](#get-running-queries) + * [Stop query](#stop-query) * [List queryable tables](#list-queryable-tables) * CUBE * [Create cube](#create-cube) + * [Update cube](#update-cube) * [List cubes](#list-cubes) * [Get cube](#get-cube) * [Get cube descriptor (dimension, measure info, etc)](#get-cube-descriptor) @@ -23,10 +30,21 @@ This page lists the major RESTful APIs provided by Kylin. * [Disable cube](#disable-cube) * [Purge cube](#purge-cube) * [Delete segment](#delete-segment) + * [Auto-Merge segment](#auto-merge-segment) + * [Get sql of a cube](#get-sql-of-a-cube) + * [Get sql of a cube segment](#get-sql-of-a-cube-segment) + * [Force rebuild lookup table snapshot](#force-rebuild-lookup-table-snapshot) + * [Clone cube](#clone-cube) + * [Delete Cube](#delete-cube) + * [Get hbase info](#get-hbase-info) + * [Get current cuboid](#get-current-cuboid) + * [Migrate cube](#migrate-cube) * MODEL * [Create model](#create-model) + * [Update model](#update-model) * [Get modelDescData](#get-modeldescdata) * [Delete model](#delete-model) + * [Clone model](#clone-model) * JOB * [Resume job](#resume-job) * [Pause job](#pause-job) @@ -35,26 +53,35 @@ This page lists the major RESTful APIs provided by Kylin. * [Get job status](#get-job-status) * [Get job step output](#get-job-step-output) * [Get job list](#get-job-list) + * [Get job status overview](#get-job-status-overview) + * [Resubmit realtime build job](#resubmit-realtime-build-job) + * [Rollback job](#rollback-job) * Metadata * [Get Hive Table](#get-hive-table) * [Get Hive Tables](#get-hive-tables) * [Load Hive Tables](#load-hive-tables) + * [Unload Hive Tables](#unload-hive-tables) + * [Show databases in hive](#show-databases-in-hive) + * [Show tables in a hive database](#show-tables-in-a-hive-database) * Cache * [Wipe cache](#wipe-cache) + * [Announce wipe cache](#announce-wipe-cache) + * [Hot load kylin config](#hot-load-kylin-config) * Streaming * [Initiate cube start position](#initiate-cube-start-position) * [Build stream cube](#build-stream-cube) * [Check segment holes](#check-segment-holes) * [Fill segment holes](#fill-segment-holes) + * [Get streaming configs](#get-streaming-configs) + * [Get Kafka configs](#get-kafka-configs) + * [Create streaming schema](#create-streaming-schema) + * [Update streaming tables](#update-streaming-schema) * ACL * [Get users can query the table](#get-users-can-query-the-table) * [Get users cannot query the table](#get-users-cannot-query-the-table) * [Put user to table blacklist](#put-user-to-table-blacklist) * [Delete user from table blacklist](#delete-user-from-table-blacklist) -* Metrics - * [Get all metrics](#get-all-metrics) - * [Get specific type of metrics](#get-specific-type-of-metrics) - + ## Authentication `POST /kylin/api/user/authentication` @@ -72,23 +99,33 @@ python -c "import base64; print base64.standard_b64encode('$UserName:$Password') #### Response Sample ```sh -{ - "userDetails":{ - "password":null, - "username":"sample", - "authorities":[ - { - "authority":"ROLE_ANALYST" - }, - { - "authority":"ROLE_MODELER" - } - ], - "accountNonExpired":true, - "accountNonLocked":true, - "credentialsNonExpired":true, - "enabled":true - } +{ + "userDetails": { + "username": "sample", + "password": "null", + "authorities": [ + { + "authority": "ROLE_ADMIN" + }, + { + "authority": "ROLE_ANALYST" + }, + { + "authority": "ROLE_MODELER" + }, + { + "authority": "ALL_USERS" + } + ], + "disabled": false, + "defaultPassword": false, + "locked": false, + "lockedTime": 0, + "wrongTime": 0, + "uuid": "3704ba8c-deb1-ac47-729d-c1039c1bd6ec", + "last_modified": 1585219480112, + "version": "3.0.0.20500" + } } ``` @@ -122,7 +159,6 @@ curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" * limit - `optional` `int` Query limit. If limit is set in sql, perPage will be ignored. * acceptPartial - `optional` `bool` Whether accept a partial result or not, default be "false". Set to "false" for production use. * project - `optional` `string` Project to perform query. Default value is 'DEFAULT'. -* backdoorToggles - `optional` `map` You can set a key value pair (`"DEBUG_TOGGLE_HIT_CUBE":"SimpleCube_01"`) to specific cube for your query. Default is empty map. #### Request Sample @@ -236,10 +272,86 @@ curl -X POST -H "Authorization: Basic XXXXXXXXX" -H "Content-Type: application/j ``` +## Prepare query +`POST /kylin/api/query/prestate` + +#### Request Body +* sql - `required` `string` The text of sql statement. +* offset - `optional` `int` Query offset. If offset is set in sql, curIndex will be ignored. +* limit - `optional` `int` Query limit. If limit is set in sql, perPage will be ignored. +* acceptPartial - `optional` `bool` Whether accept a partial result or not, default be "false". Set to "false" for production use. +* project - `optional` `string` Project to perform query. Default value is 'DEFAULT'. + +#### Request Sample + +```sh +{ + "sql":"select * from TEST_KYLIN_FACT", + "offset":0, + "limit":50000, + "acceptPartial":false, + "project":"DEFAULT" +} +``` + + +## Save query +`POST /kylin/api/saved_queries` + +#### Request Body +* sql - `required` `string` The text of sql statement. +* name - `required` `string` Sql name. +* project - `required` `string` Project to perform query. +* description - `optional` `string` Sql description. + +#### Request Sample + +```sh +{ + "sql": "select count(*) from kylin_sales", + "name": "test", + "project": "learn_kylin" +} +``` + + +## Remove saved query +`DELETE /kylin/api/saved_queries/{id}` + +#### Request Parameters +* id - `required` `string` The id of saved query you want to remove + + +## Get saved queries +`GET /kylin/api/saved_queries` + +#### Response Sample +``` +[ + { + "name": "test", + "project": "learn_kylin", + "sql": "select count(*) from kylin_sales", + "description": null, + "id": "-1674470999" + } +] +``` + +## Get running queries +`GET /kylin/api/query/runningQueries` + +## Stop Query +`PUT /kylin/api/query/{queryId}/stop` + +#### Path Variable +* queryId - `required` `String` The queryId of you want to stop. You can obtain it by `Get running queries`. + + ## List queryable tables `GET /kylin/api/tables_and_columns` -#### Request Parameters +#### Path Variable * project - `required` `string` The project to load tables #### Response Sample @@ -346,6 +458,19 @@ curl -X POST -H "Authorization: Basic XXXXXXXXX" -H "Content-Type: application/j } ``` +## Update Cube +`PUT /kylin/api/cubes` + +#### Request Body +(Same as "Create Cube") + +#### Request Sample +(Same as "Create Cube") + +#### Response Sample +(Same as "Create Cube") + + ## List cubes `GET /kylin/api/cubes` @@ -394,252 +519,438 @@ Get descriptor for specified cube instance. ```sh [ { - "uuid": "a24ca905-1fc6-4f67-985c-38fa5aeafd92", - "name": "test_kylin_cube_with_slr_desc", - "description": null, + "uuid": "0ef9b7a8-3929-4dff-b59d-2100aadc8dbf", + "last_modified": 1574402902000, + "version": "3.0.0.20500", + "name": "kylin_sales_cube", + "is_draft": false, + "model_name": "kylin_sales_model", + "description": "", + "null_string": null, "dimensions": [ { - "id": 0, - "name": "CAL_DT", - "table": "EDW.TEST_CAL_DT", - "column": null, - "derived": [ - "WEEK_BEG_DT" - ], - "hierarchy": false - }, + "name": "TRANS_ID", + "table": "KYLIN_SALES", + "column": "TRANS_ID", + "derived": null + }, { - "id": 1, - "name": "CATEGORY", - "table": "DEFAULT.TEST_CATEGORY_GROUPINGS", - "column": null, + "name": "YEAR_BEG_DT", + "table": "KYLIN_CAL_DT", + "column": null, "derived": [ - "USER_DEFINED_FIELD1", - "USER_DEFINED_FIELD3", - "UPD_DATE", - "UPD_USER" - ], - "hierarchy": false - }, + "YEAR_BEG_DT" + ] + }, { - "id": 2, - "name": "CATEGORY_HIERARCHY", - "table": "DEFAULT.TEST_CATEGORY_GROUPINGS", - "column": [ - "META_CATEG_NAME", - "CATEG_LVL2_NAME", - "CATEG_LVL3_NAME" - ], - "derived": null, - "hierarchy": true - }, + "name": "MONTH_BEG_DT", + "table": "KYLIN_CAL_DT", + "column": null, + "derived": [ + "MONTH_BEG_DT" + ] + }, { - "id": 3, - "name": "LSTG_FORMAT_NAME", - "table": "DEFAULT.TEST_KYLIN_FACT", - "column": [ - "LSTG_FORMAT_NAME" - ], - "derived": null, - "hierarchy": false - }, + "name": "WEEK_BEG_DT", + "table": "KYLIN_CAL_DT", + "column": null, + "derived": [ + "WEEK_BEG_DT" + ] + }, { - "id": 4, - "name": "SITE_ID", - "table": "EDW.TEST_SITES", - "column": null, + "name": "USER_DEFINED_FIELD1", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": null, "derived": [ - "SITE_NAME", - "CRE_USER" - ], - "hierarchy": false - }, + "USER_DEFINED_FIELD1" + ] + }, { - "id": 5, - "name": "SELLER_TYPE_CD", - "table": "EDW.TEST_SELLER_TYPE_DIM", - "column": null, + "name": "USER_DEFINED_FIELD3", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": null, "derived": [ - "SELLER_TYPE_DESC" - ], - "hierarchy": false - }, + "USER_DEFINED_FIELD3" + ] + }, { - "id": 6, - "name": "SELLER_ID", - "table": "DEFAULT.TEST_KYLIN_FACT", - "column": [ - "SELLER_ID" - ], - "derived": null, - "hierarchy": false + "name": "META_CATEG_NAME", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": "META_CATEG_NAME", + "derived": null + }, + { + "name": "CATEG_LVL2_NAME", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": "CATEG_LVL2_NAME", + "derived": null + }, + { + "name": "CATEG_LVL3_NAME", + "table": "KYLIN_CATEGORY_GROUPINGS", + "column": "CATEG_LVL3_NAME", + "derived": null + }, + { + "name": "LSTG_FORMAT_NAME", + "table": "KYLIN_SALES", + "column": "LSTG_FORMAT_NAME", + "derived": null + }, + { + "name": "SELLER_ID", + "table": "KYLIN_SALES", + "column": "SELLER_ID", + "derived": null + }, + { + "name": "BUYER_ID", + "table": "KYLIN_SALES", + "column": "BUYER_ID", + "derived": null + }, + { + "name": "ACCOUNT_BUYER_LEVEL", + "table": "BUYER_ACCOUNT", + "column": "ACCOUNT_BUYER_LEVEL", + "derived": null + }, + { + "name": "ACCOUNT_SELLER_LEVEL", + "table": "SELLER_ACCOUNT", + "column": "ACCOUNT_SELLER_LEVEL", + "derived": null + }, + { + "name": "BUYER_COUNTRY", + "table": "BUYER_ACCOUNT", + "column": "ACCOUNT_COUNTRY", + "derived": null + }, + { + "name": "SELLER_COUNTRY", + "table": "SELLER_ACCOUNT", + "column": "ACCOUNT_COUNTRY", + "derived": null + }, + { + "name": "BUYER_COUNTRY_NAME", + "table": "BUYER_COUNTRY", + "column": "NAME", + "derived": null + }, + { + "name": "SELLER_COUNTRY_NAME", + "table": "SELLER_COUNTRY", + "column": "NAME", + "derived": null + }, + { + "name": "OPS_USER_ID", + "table": "KYLIN_SALES", + "column": "OPS_USER_ID", + "derived": null + }, + { + "name": "OPS_REGION", + "table": "KYLIN_SALES", + "column": "OPS_REGION", + "derived": null } - ], + ], "measures": [ { - "id": 1, - "name": "GMV_SUM", + "name": "GMV_SUM", "function": { - "expression": "SUM", + "expression": "SUM", "parameter": { - "type": "column", - "value": "PRICE", - "next_parameter": null - }, + "type": "column", + "value": "KYLIN_SALES.PRICE" + }, "returntype": "decimal(19,4)" - }, - "dependent_measure_ref": null - }, + } + }, { - "id": 2, - "name": "GMV_MIN", + "name": "BUYER_LEVEL_SUM", "function": { - "expression": "MIN", + "expression": "SUM", "parameter": { - "type": "column", - "value": "PRICE", - "next_parameter": null - }, - "returntype": "decimal(19,4)" - }, - "dependent_measure_ref": null - }, + "type": "column", + "value": "BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL" + }, + "returntype": "bigint" + } + }, { - "id": 3, - "name": "GMV_MAX", + "name": "SELLER_LEVEL_SUM", "function": { - "expression": "MAX", + "expression": "SUM", "parameter": { - "type": "column", - "value": "PRICE", - "next_parameter": null - }, - "returntype": "decimal(19,4)" - }, - "dependent_measure_ref": null - }, + "type": "column", + "value": "SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL" + }, + "returntype": "bigint" + } + }, { - "id": 4, - "name": "TRANS_CNT", + "name": "TRANS_CNT", "function": { - "expression": "COUNT", + "expression": "COUNT", "parameter": { - "type": "constant", - "value": "1", - "next_parameter": null - }, + "type": "constant", + "value": "1" + }, "returntype": "bigint" - }, - "dependent_measure_ref": null - }, + } + }, { - "id": 5, - "name": "ITEM_COUNT_SUM", + "name": "SELLER_CNT_HLL", "function": { - "expression": "SUM", + "expression": "COUNT_DISTINCT", "parameter": { - "type": "column", - "value": "ITEM_COUNT", - "next_parameter": null - }, - "returntype": "bigint" - }, - "dependent_measure_ref": null + "type": "column", + "value": "KYLIN_SALES.SELLER_ID" + }, + "returntype": "hllc(10)" + } + }, + { + "name": "TOP_SELLER", + "function": { + "expression": "TOP_N", + "parameter": { + "type": "column", + "value": "KYLIN_SALES.PRICE", + "next_parameter": { + "type": "column", + "value": "KYLIN_SALES.SELLER_ID" + } + }, + "returntype": "topn(100)", + "configuration": { + "topn.encoding.KYLIN_SALES.SELLER_ID": "dict", + "topn.encoding_version.KYLIN_SALES.SELLER_ID": "1" + } + } } - ], + ], "rowkey": { "rowkey_columns": [ { - "column": "SELLER_ID", - "length": 18, - "dictionary": null, - "mandatory": true - }, + "column": "KYLIN_SALES.BUYER_ID", + "encoding": "integer:4", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.SELLER_ID", + "encoding": "integer:4", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.TRANS_ID", + "encoding": "integer:4", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.PART_DT", + "encoding": "date", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.LEAF_CATEG_ID", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "CAL_DT", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "LEAF_CATEG_ID", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "META_CATEG_NAME", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL3_NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "CATEG_LVL2_NAME", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "CATEG_LVL3_NAME", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "LSTG_FORMAT_NAME", - "length": 12, - "dictionary": null, - "mandatory": false - }, + "column": "BUYER_ACCOUNT.ACCOUNT_COUNTRY", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "LSTG_SITE_ID", - "length": 0, - "dictionary": "true", - "mandatory": false - }, + "column": "SELLER_ACCOUNT.ACCOUNT_COUNTRY", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, { - "column": "SLR_SEGMENT_CD", - "length": 0, - "dictionary": "true", - "mandatory": false + "column": "BUYER_COUNTRY.NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "SELLER_COUNTRY.NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.LSTG_FORMAT_NAME", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.LSTG_SITE_ID", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.OPS_USER_ID", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false + }, + { + "column": "KYLIN_SALES.OPS_REGION", + "encoding": "dict", + "encoding_version": 1, + "isShardBy": false } - ], - "aggregation_groups": [ - [ - "LEAF_CATEG_ID", - "META_CATEG_NAME", - "CATEG_LVL2_NAME", - "CATEG_LVL3_NAME", - "CAL_DT" - ] ] - }, - "signature": "lsLAl2jL62ZApmOLZqWU3g==", - "last_modified": 1445850327000, - "model_name": "test_kylin_with_slr_model_desc", - "null_string": null, + }, "hbase_mapping": { "column_family": [ { - "name": "F1", + "name": "F1", "columns": [ { - "qualifier": "M", + "qualifier": "M", "measure_refs": [ - "GMV_SUM", - "GMV_MIN", - "GMV_MAX", - "TRANS_CNT", - "ITEM_COUNT_SUM" + "GMV_SUM", + "BUYER_LEVEL_SUM", + "SELLER_LEVEL_SUM", + "TRANS_CNT" + ] + } + ] + }, + { + "name": "F2", + "columns": [ + { + "qualifier": "M", + "measure_refs": [ + "SELLER_CNT_HLL", + "TOP_SELLER" ] } ] } ] - }, - "notify_list": null, - "auto_merge_time_ranges": null, - "retention_range": 0 + }, + "aggregation_groups": [ + { + "includes": [ + "KYLIN_SALES.PART_DT", + "KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME", + "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME", + "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL3_NAME", + "KYLIN_SALES.LEAF_CATEG_ID", + "KYLIN_SALES.LSTG_FORMAT_NAME", + "KYLIN_SALES.LSTG_SITE_ID", + "KYLIN_SALES.OPS_USER_ID", + "KYLIN_SALES.OPS_REGION", + "BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL", + "SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL", + "BUYER_ACCOUNT.ACCOUNT_COUNTRY", + "SELLER_ACCOUNT.ACCOUNT_COUNTRY", + "BUYER_COUNTRY.NAME", + "SELLER_COUNTRY.NAME" + ], + "select_rule": { + "hierarchy_dims": [ + [ + "KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME", + "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME", + "KYLIN_CATEGORY_GROUPINGS.CATEG_LVL3_NAME", + "KYLIN_SALES.LEAF_CATEG_ID" + ] + ], + "mandatory_dims": [ + "KYLIN_SALES.PART_DT" + ], + "joint_dims": [ + [ + "BUYER_ACCOUNT.ACCOUNT_COUNTRY", + "BUYER_COUNTRY.NAME" + ], + [ + "SELLER_ACCOUNT.ACCOUNT_COUNTRY", + "SELLER_COUNTRY.NAME" + ], + [ + "BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL", + "SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL" + ], + [ + "KYLIN_SALES.LSTG_FORMAT_NAME", + "KYLIN_SALES.LSTG_SITE_ID" + ], + [ + "KYLIN_SALES.OPS_USER_ID", + "KYLIN_SALES.OPS_REGION" + ] + ] + } + } + ], + "signature": null, + "notify_list": [], + "status_need_notify": [], + "partition_date_start": 1325376000000, + "partition_date_end": 3153600000000, + "auto_merge_time_ranges": [], + "volatile_range": 0, + "retention_range": 0, + "engine_type": 2, + "storage_type": 2, + "override_kylin_properties": { + "kylin.cube.aggrgroup.is-mandatory-only-valid": "true", + "kylin.engine.spark.rdd-partition-cut-mb": "500" + }, + "cuboid_black_list": [], + "parent_forward": 3, + "mandatory_dimension_set_list": [], + "snapshot_table_desc_list": [] } ] ``` @@ -851,6 +1162,83 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js ## Delete Segment `DELETE /kylin/api/cubes/{cubeName}/segs/{segmentName}` + +## Auto-merge Segment +`PUT /kylin/api/cubes/{cubeName}/automerge` + + +## Get sql of a cube +`GET /kylin/api/cubes/{cubeName}/sql` + +#### Path variable +* cubeName - `required` `string` Cube name. + +#### Response Sample +```sh +{ + "sql": "SELECT\n`KYLIN_SALES`.`TRANS_ID` as `KYLIN_SALES_TRANS_ID`\n,`KYLIN_SALES`.`PART_DT` as `KYLIN_SALES_PART_DT`\n,`KYLIN_CAL_DT`.`YEAR_BEG_DT` as `KYLIN_CAL_DT_YEAR_BEG_DT`\n,`KYLIN_CAL_DT`.`MONTH_BEG_DT` as `KYLIN_CAL_DT_MONTH_BEG_DT`\n,`KYLIN_CAL_DT`.`WEEK_BEG_DT` as `KYLIN_CAL_DT_WEEK_BEG_DT`\n,`KYLIN_SALES`.`LEAF_CATEG_ID` as `KYLIN_SALES_LEAF_CATEG_ID`\n,`KYLIN_SALES`.`LSTG_SITE_ID` as `KYLIN_SALES_LSTG_SITE_ID`\n,`KYLIN_CATEGORY_GROUPINGS`.`USER_DEFINED_FIELD1` as `KYLIN_ [...] +} +``` + + +## Get sql of a cube segment +`GET /kylin/api/cubes/{cubeName}/segs/{segmentName}/sql` + +#### Path variable +* cubeName - `required` `string` Cube name. +* segmentName - `required` `string` Segment name. + +#### Response Sample +```sh +{ + "sql": "SELECT\n`KYLIN_SALES`.`TRANS_ID` as `KYLIN_SALES_TRANS_ID`\n,`KYLIN_SALES`.`PART_DT` as `KYLIN_SALES_PART_DT`\n,`KYLIN_CAL_DT`.`YEAR_BEG_DT` as `KYLIN_CAL_DT_YEAR_BEG_DT`\n,`KYLIN_CAL_DT`.`MONTH_BEG_DT` as `KYLIN_CAL_DT_MONTH_BEG_DT`\n,`KYLIN_CAL_DT`.`WEEK_BEG_DT` as `KYLIN_CAL_DT_WEEK_BEG_DT`\n,`KYLIN_SALES`.`LEAF_CATEG_ID` as `KYLIN_SALES_LEAF_CATEG_ID`\n,`KYLIN_SALES`.`LSTG_SITE_ID` as `KYLIN_SALES_LSTG_SITE_ID`\n,`KYLIN_CATEGORY_GROUPINGS`.`USER_DEFINED_FIELD1` as `KYLIN_ [...] +} +``` + + +## Force rebuild lookup table snapshot +`PUT /kylin/api/cubes/{cubeName}/refresh_lookup` + + +## Clone cube +`PUT /kylin/api/cubes/{cubeName}/clone` + + +## Delete cube +`DELETE /kylin/api/cubes/{cubeName}` + + +## Get hbase info +`GET /kylin/api/cubes/{cubeName}/hbase` + +#### Response Sample +```sh +[ + { + "segmentName": "20120101000000_20120103000000", + "segmentUUID": null, + "segmentStatus": "READY", + "tableName": "KYLIN_E1VT22737D", + "tableSize": 0, + "regionCount": 1, + "dateRangeStart": 1325376000000, + "dateRangeEnd": 1325548800000, + "sourceOffsetStart": 0, + "sourceOffsetEnd": 0, + "sourceCount": 29 + } +] +``` + + +## Get current cuboid +`GET /kylin/api/cubes/{cubeName}/cuboids/current` + + +## Migrate Cube +`POST /kylin/api/cubes/{cube}/{project}/migrate` + + *** ## Create Model @@ -862,20 +1250,20 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js * projectName - `required` `string` projectName to which model belongs #### Request Sample -``` +```sh { "modelDescData": "{\"uuid\": \"0928468a-9fab-4185-9a14-6f2e7c74823f\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_model\",\"owner\": null,\"is_draft\": false,\"description\": \"\",\"fact_table\": \"DEFAULT.KYLIN_SALES\",\"lookups\": [{\"table\": \"DEFAULT.KYLIN_CAL_DT\",\"kind\": \"LOOKUP\",\"alias\": \"KYLIN_CAL_DT\",\"join\": {\"type\": \"inner\",\"primary_key\": [\"KYLIN_CAL_DT.CAL_DT\"],\"foreign_key\": [\"KYLIN_SALES.PART_DT\"]}},{\"table\": \"DEFAULT.KY [...] "modelName": "kylin_test_model", "project": "learn_kylin" } ``` - + #### Response Sample -``` +```sh { "uuid": "2613d739-14c1-38ac-2e37-f36e46fd9976", "modelName": "kylin_test_model", -"modelDescData": "{\"uuid\": \"0928468a-9fab-4185-9a14-6f2e7c74823f\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_model\",\"owner\": null,\"is_draft\": false,\"description\": \"\",\"fact_table\": \"DEFAULT.KYLIN_SALES\",\"lookups\": [{\"table\": \"DEFAULT.KYLIN_CAL_DT\",\"kind\": \"LOOKUP\",\"alias\": \"KYLIN_CAL_DT\",\"join\": {\"type\": \"inner\",\"primary_key\": [\"KYLIN_CAL_DT.CAL_DT\"],\"foreign_key\": [\"KYLIN_SALES.PART_DT\"]}},{\"table\": \"DEFAULT.KY [...] +"modelDescData": "{\"uuid\": \"0928468a-9fab-4185-9a14-6f2e7c74823f\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_model\",\"owner\": null,\"is_draft\": false,\"description\": \"\",\"fact_table\": \"DEFAULT.KYLIN_SALES\",\"lookups\": [{\"table\": \"DEFAULT.KYLIN_CAL_DT\",\"kind\": \"LOOKUP\",\"alias\": \"KYLIN_CAL_DT\",\"join\": {\"type\": \"inner\",\"primary_key\": [\"KYLIN_CAL_DT.CAL_DT\"],\"foreign_key\": [\"KYLIN_SALES.PART_DT\"]}},{\"table\": \"DEFAULT.KY [...] "successful": true, "message": null, "project": "learn_kylin", @@ -884,10 +1272,24 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js } ``` + +## Update Model +`PUT /kylin/api/models` + +#### Request Body +(Same as "Create Model") + +#### Request Sample +(Same as "Create Model") + +#### Response Sample +(Same as "Create Model") + + ## Get ModelDescData `GET /kylin/api/models` -##### Request Parameters +#### Request Parameters * modelName - `optional` `string` Model name. * projectName - `optional` `string` Project Name. * limit - `optional` `integer` Offset used by pagination @@ -1087,7 +1489,10 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js `DELETE /kylin/api/models/{modelName}` #### Path variable -* modelName - `required` `string` Model name you want delete. +* modelName - `required` `string` Model name. + +## Clone Model +`PUT /kylin/api/models/{modelName}/clone` *** @@ -1216,7 +1621,6 @@ For example, to get the job list in project 'learn_kylin' for cube 'kylin_sales_ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=learn_kylin&timeFilter=1 ``` - #### Response Sample ``` [ @@ -1270,6 +1674,81 @@ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=lea } ] ``` + +## Get Job Status Overview +`GET /kylin/api/jobs/overview` + +### Request Variables +(Same as "Get job list") + +#### Response Sample +```sh +{ + "DISCARDED": 0, + "NEW": 0, + "STOPPED": 0, + "PENDING": 0, + "RUNNING": 0, + "FINISHED": 1, + "ERROR": 0 +} +``` + +## Resubmit realtime build job +`PUT /kylin/api/jobs/{jobId}/resubmit` + +## Rollback job +`PUT /kylin/api/{jobId}/steps/{stepId}/rollback` + +#### Path Parameters +* jobId - `required` `string` job id you want to rollback +* stepId - `required` `string` specify rollback step id, e.g.(Create Intermediate Flat Hive:1) + +For example, rollback job to Create Intermediate Flat Hive: +`PUT: kylin/api/jobs/4e84cb5e-a929-89c7-6240-768fa9835d89/steps/1/rollback` + +#### Response Sample +```sh +{ + "uuid": "4e84cb5e-a929-89c7-6240-768fa9835d89", + "last_modified": 1590054128311, + "version": "3.0.0.20500", + "name": "BUILD CUBE - kylin_sales_cube - 20120102000000_20120103000000 - CST 2020-05-21 17:38:59", + "projectName": "learn_kylin", + "type": "BUILD", + "duration": 187, + "related_cube": "kylin_sales_cube", + "display_cube_name": "kylin_sales_cube", + "related_segment": "fca98f62-cb3f-8b53-5bf1-94a85334560b", + "exec_start_time": 1590053963522, + "exec_end_time": 0, + "exec_interrupt_time": 0, + "mr_waiting": 39, + "steps": [ + { + "interruptCmd": null, + "id": "4e84cb5e-a929-89c7-6240-768fa9835d89-00", + "name": "Create Intermediate Flat Hive Table", + "sequence_id": 0, + "exec_cmd": null, + "interrupt_cmd": null, + "exec_start_time": 0, + "exec_end_time": 0, + "exec_wait_time": 0, + "step_status": "PENDING", + "cmd_type": "SHELL_CMD_HADOOP", + "info": {}, + "run_async": false + }, + ... + ], + "submitter": "ADMIN", + "job_status": "RUNNING", + "build_instance": "20984@host", + "progress": 0.0 +} +``` + *** ## Get Hive Table @@ -1377,6 +1856,29 @@ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=lea "result.unloaded": ["sapmle_08"] } ``` + +## Unload Hive Tables +`DELETE /kylin/api/tables/{tables}/{project}` + +#### Path Parameters +* tables - `required` `string` table names you want to unload, separated with comma. +* project - `required` `String` the project which the tables belong to. + +#### Response Sample +```sh +{ + "result.unload.success": [ + "kylin_sales" + ], + "result.unload.fail": [] +} +``` + +## Show databases in hive +`GET /kylin/api/tables/hive` + +## Show tables in a hive database +`GET /kylin/api/tables/hive/{database}` *** @@ -1388,6 +1890,17 @@ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=lea * name - `required` `string` Cache key, e.g the cube name. * action - `required` `string` 'create', 'update' or 'drop' + +## Announce wipe cache +`PUT /kylin/api/cache/announce/{type}/{name}/{action}` + +#### Path variable +(Same as "Wipe cache") + +## Hot load kylin config +`POST /kylin/api/cache/announce/config` + + *** ## Initiate cube start position @@ -1460,6 +1973,102 @@ This API is specific for stream cube's building; #### Path variable * cubeName - `required` `string` Cube name +## Get streaming configs +`GET /kylin/api/streaming/getConfig` + +#### Response sample +```sh +[ + { + "uuid": "8b2b9dfe-777c-4d39-bf89-8472ec929193", + "last_modified": 1587528491000, + "version": "3.0.0.20500", + "name": "DEFAULT.KYLIN_STREAMING_TABLE", + "type": "kafka" + } +] +``` + +## Get kafka configs +`GET /kylin/api/streaming/getKfkConfig` + +#### Response sample +```sh +[ + { + "uuid": "8b2b9dfe-777c-4d39-bf89-8472ec919193", + "last_modified": 1587528491000, + "version": "3.0.0.20500", + "name": "DEFAULT.KYLIN_STREAMING_TABLE", + "clusters": [ + { + "uuid": null, + "last_modified": 0, + "version": "3.0.0.20500", + "brokers": [ + { + "id": 0, + "host": "localhost", + "port": 9092 + } + ] + } + ], + "topic": "kylin_streaming_topic", + "timeout": 60000, + "parserName": "org.apache.kylin.source.kafka.TimedJsonStreamParser", + "timestampField": "order_time", + "margin": 0, + "splitRows": 1000000, + "parserProperties": null + } +] +``` + +## Create streaming schema +`POST /kylin/api/streaming` + +#### Request body +* project - `required` `string` Project which you want create streaming schema to. +* tableData - `required` `string` Streaming table desc. +* streamingConfig - `required` `string` Streaming config. +* kafkaConfig - `required` `string` Kafka config. + +#### Request sample +``` +{ + "project":"test", + "tableData":"{\"uuid\": \"e286e39e-41d7-44c2-8fa2-41b365123987\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"KYLIN_TEST_STREAMING_TABLE\",\"columns\": [{\"id\": \"1\",\"name\": \"AMOUNT\",\"datatype\": \"decimal(19,4)\"},{\"id\": 3,\"name\": \"ORDER_TIME\",\"datatype\": \"timestamp\",\"index\": \"T\"}],\"source_type\": 1,\"table_type\": null,\"database\": \"DEFAULT\"}", + "streamingConfig":"{\"uuid\": \"8b2b9dfe-777c-4d39-bf89-8472ec929193\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"DEFAULT.KYLIN_TEST_STREAMING_TABLE\",\"type\": \"kafka\"}", + "kafkaConfig":"{\"uuid\": \"8b2b9dfe-777c-4d39-bf89-8472ec919193\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"DEFAULT.KYLIN_STREAMING_TABLE\",\"clusters\": [{\"uuid\": null,\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"brokers\": [{\"id\": 0,\"host\": \"localhost\",\"port\": 9092}]}],\"topic\": \"kylin_streaming_topic\",\"timeout\": 60000,\"parserName\": \"org.apache.kylin.source.kafka.TimedJsonStreamParser\",\"timestampField\": \"order_time\",\"margin\": 0,\ [...] +} +``` + +#### Response sample +```sh +{ + "project": "test", + "tableData": "{\"uuid\": \"e286e39e-41d7-44c2-8fa2-41b365123987\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"KYLIN_TEST_STREAMING_TABLE\",\"columns\": [{\"id\": \"1\",\"name\": \"AMOUNT\",\"datatype\": \"decimal(19,4)\"},{\"id\": 3,\"name\": \"ORDER_TIME\",\"datatype\": \"timestamp\",\"index\": \"T\"}],\"source_type\": 1,\"table_type\": null,\"database\": \"DEFAULT\"}", + "streamingConfig": "{\"uuid\": \"8b2b9dfe-777c-4d39-bf89-8472ec929193\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"DEFAULT.KYLIN_TEST_STREAMING_TABLE\",\"type\": \"kafka\"}", + "kafkaConfig": "{\"uuid\": \"8b2b9dfe-777c-4d39-bf89-8472ec919193\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"DEFAULT.KYLIN_STREAMING_TABLE\",\"clusters\": [{\"uuid\": null,\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"brokers\": [{\"id\": 0,\"host\": \"localhost\",\"port\": 9092}]}],\"topic\": \"kylin_streaming_topic\",\"timeout\": 60000,\"parserName\": \"org.apache.kylin.source.kafka.TimedJsonStreamParser\",\"timestampField\": \"order_time\",\"margin\": 0, [...] + "successful": true, + "message": null +} +``` + +## Update streaming schema +`PUT /kylin/api/streaming` + +#### Request body +(Same as "Create streaming schema) + +#### Request sample +(Same as "Create streaming schema) + +#### Response sample +(Same as "Create streaming schema) + + *** ## Get users can query the table @@ -1496,28 +2105,6 @@ This API is specific for stream cube's building; * table - `required` `string` table name * name - `required` `string` user name or group name you want to delete from table blacklist -*** - -## Get all metrics -`GET /kylin/api/jmetrics/` - -#### Response sample -``` -{ - "version": "3.0.0", - "gauges": {}, - "counters": {}, - "histograms": {}, - "meters": {}, - "timers": {} -} -``` - -## Get specific type of metrics -`GET /kylin/api/jmetrics/{type}` - -#### Path variable -* type - `required` `string` Specific type of metrics you want, e.g meters ## Use RESTful API in Javascript diff --git a/website/_docs31/security.md b/website/_docs31/security.md new file mode 100644 index 0000000..691ad0e --- /dev/null +++ b/website/_docs31/security.md @@ -0,0 +1,76 @@ +--- +layout: docs +title: Security Issues +categories: docs +permalink: /docs31/security.html +--- + +### [CVE-2020-1937](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-1937) Apache Kylin SQL injection vulnerability + +__Severity__ + +Important + +__Vendor__ + +The Apache Software Foundation + + +__Versions Affected__ + +Kylin 2.3.0 to 2.3.2 + +Kylin 2.4.0 to 2.4.1 + +Kylin 2.5.0 to 2.5.2 + +Kylin 2.6.0 to 2.6.4 + +Kylin 3.0.0-alpha, Kylin 3.0.0-alpha2, Kylin 3.0.0-beta, Kylin 3.0.0 + +__Description__ + +Kylin has some restful apis which will concat sqls with the user input string, a user is likely to be able to run malicious database queries. + +__Mitigation__ + +Users should upgrade to 3.0.1 or 2.6.5 + +__Credit__ + +This issue was discovered by Jonathan Leitschuh + +### [CVE-2020-1956](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-1956) Apache Kylin command injection vulnerability + +__Severity__ + + +Important + +__Vendor__ + +The Apache Software Foundation + +__Versions Affected__ + +Kylin 2.3.0 to 2.3.2 + +Kylin 2.4.0 to 2.4.1 + +Kylin 2.5.0 to 2.5.2 + +Kylin 2.6.0 to 2.6.5 + +Kylin 3.0.0-alpha, Kylin 3.0.0-alpha2, Kylin 3.0.0-beta, Kylin 3.0.0, Kylin 3.0.1 + +__Description__ + +Kylin has some restful api which will concat os command with the user input string, a user is likely to be able to execute any os command without any protection or validation. + +__Mitigation__ + +Users should upgrade to 3.0.2 or 2.6.6 or set kylin.tool.auto-migrate-cube.enabled to false to disable command execution. + +__Credit__ + +This issue was discovered by Johannes Dahse diff --git a/website/_docs31/tutorial/lambda_mode_and_timezone_realtime_olap.md b/website/_docs31/tutorial/lambda_mode_and_timezone_realtime_olap.md new file mode 100644 index 0000000..7eced6a --- /dev/null +++ b/website/_docs31/tutorial/lambda_mode_and_timezone_realtime_olap.md @@ -0,0 +1,175 @@ +--- +layout: docs +title: Lambda mode and Timezone in Real-time OLAP +categories: tutorial +permalink: /docs31/tutorial/lambda_mode_and_timezone_realtime_olap.html +--- + +Kylin v3.0.0 will release the real-time OLAP feature, by the power of newly added streaming reciever cluster, Kylin can query streaming data with sub-second latency. You can check [this tech blog](/blog/2019/04/12/rt-streaming-design/) for the overall design and core concept. + +If you want to find a step by step tutorial, please check this [this tech blog](/docs/tutorial/realtime_olap.html). +In this article, we will introduce how to update segment and set timezone for derived time column in realtime OLAP cube. + +# Background + +Says we have Kafka message which looks like this: + +{% highlight Groff markup %} +{ + "s_nation":"SAUDI ARABIA", + "lo_supplycost":74292, + "p_category":"MFGR#0910", + "local_day_hour_minute":"09_21_44", + "event_time":"2019-12-09 08:44:50.000-0500", + "local_day_hour":"09_21", + "lo_quantity":12, + "lo_revenue":1411548, + "p_brand":"MFGR#0910051", + "s_region":"MIDDLE EAST", + "lo_discount":5, + "customer_info":{ + "CITY":"CHINA 057", + "REGION":"ASIA", + "street":"CHINA 05721", + "NATION":"CHINA" + }, + "d_year":1994, + "d_weeknuminyear":30, + "p_mfgr":"MFGR#09", + "v_revenue":7429200, + "d_yearmonth":"Jul1994", + "s_city":"SAUDI ARA15", + "profit_ratio":0.05263157894736842, + "d_yearmonthnum":199407, + "round":1 +} +{% endhighlight %} + +This sample comes from SSB with some additional fields such as `event_time`. We have the field such as `event_time`, which stands for the timestamp of current event. +And we assume that event come from countries of different timezone, "2019-12-09 08:44:50.000-0500" indicated that event applies `America/New_York` timezone. You may have some events which come from `Asia/Shanghai` as well. + +`local_day_hour_minute` is a column which value is in local timezone, eg. "GMT+8" in the above sample. + +### Question +When perform realtime OLAP analysis with Kylin, you may have some concerns included: + +1. Will events in different timezones cause incorrect query results? +2. How could I make it correct when kafka messages contain the value which is not what you want, says some dimension value is misspelled? +3. How could I retrieve long-late messages which has been dropped? +4. My query only hit a small range of time, how should I write filter condition to make sure unused segments are purged/skipped from scan? + +### Quick Answer +For the first question, you can always get the correct result in the right timezone of location by set `kylin.stream.event.timezone=GMT+N` for all Kylin processes. By default, UTC is used for *derived time column*. + +For the second and third question, in fact you cannot update/append segment to a normal streaming cube, but you can update/append a streaming cube which in lambda mode, all you need to prepare is creating a Hive table which is mapped to your kafka event. + +For the fourth question, you can achieved this by adding *derived time column* in your filter condition like `MINUTE_START`/`DAY_START` etc. + +# How to do + +### Configure timezone +We know message may come from different timezone, but you want query results using some specific timezone. +For example, if you live in some place in GMT+2, please set `kylin.stream.event.timezone=GMT+2` for all Kylin process. + + +### Create lambda table + +You should create a hive table in *default* namespace, and this table should contains all your dimension and measure columns, please + remember to include derived time column like `MINUTE_START`/`DAY_START` if you set them in your cube's dimension column. + +Depend on which granularity level you want to update segment, you can choose HOUR_START* or `DAY_START` as partition column of this hive table. + +{% highlight Groff markup %} +use default; +CREATE EXTERNAL TABLE IF NOT EXISTS lambda_flat_table +( +-- event timestamp and debug purpose column +EVENT_TIME timestamp +,ROUND bigint COMMENT "For debug purpose, in which round did this event sent by producer" +,LOCAL_DAY_HOUR string COMMENT "For debug purpose, maybe check timezone etc" +,LOCAL_MINUTE string COMMENT "For debug purpose, maybe check timezone etc" + +-- dimension column on fact table +,LO_QUANTITY bigint +,LO_DISCOUNT bigint + +-- dimension column on dimension table +,C_REGION string +,C_NATION string +,C_CITY string + +,D_YEAR int +,D_YEARMONTH string +,D_WEEKNUMINYEAR int +,D_YEARMONTHNUM int + +,S_REGION string +,S_NATION string +,S_CITY string + +,P_CATEGORY string +,P_BRAND string +,P_MFGR string + + +-- measure column on fact table +,V_REVENUE bigint +,LO_SUPPLYCOST bigint +,LO_REVENUE bigint +,PROFIT_RATIO double + +-- for kylin used +,MINUTE_START timestamp +,HOUR_START timestamp +,MONTH_START date +) +PARTITIONED BY (DAY_START date) +STORED AS SEQUENCEFILE +LOCATION 'hdfs:///LacusDir/lambda_flat_table'; +{% endhighlight %} + + +### Create streaming cube in Kylin +The first step is to add information like broker list and topic name; +after that, you should paste sample message into left and let Kylin auto-detect the column name and column type. +You may find some data type is not correct, please fix them manually and make sure they are aligned to the data type in Hive table. + +For example, you should change the data type of event_time from varchar to timestamp. +And some column names are not the same as Hive Table, so please correct them too, such as `customer_info_REGION` to `C_REGION`. + + + +After that, please choose the right *TSColumn* *TSParser* and correct *Table Name*, table name should be identical to the name of Hive Table. After that, you should click *submit* buttom. +If you are lucky enough, table meta info will be saved successfully, otherwise please correct data type and column name according to output message. + +When you are creating Model, please set *Partition Date Column* with the right value. For streaming cube, *Partition Date Column* is used to generate HQL in updating segment which source data is from Hive. + + +### Check result with timezone + +Let us do a quick check to compare whether *LOCAL_MINUTE* is aligned to *HOUR_START*. +{% highlight Groff markup %} +SELECT LOCAL_MINUTE, HOUR_START, sum(LO_SUPPLYCOST) +FROM LAMBDA_FLAT_TABLE +WHERE day_start = '2019-12-09' +GROUP BY LOCAL_MINUTE, HOUR_START +ORDER BY LOCAL_MINUTE, HOUR_START +{% endhighlight %} + + + +### Update segment + +1. Use some ETL tools like spark streaming to write correct data into HDFS, and add new partition based on your new data files. +2. After that, use Rest API `http://localhost:7070/kylin/api/cubes/{cube_name}/rebuild` [Put Method] to submit a build job to replace old segments, +please add offset according to timezone in `startTime` and `endTime` if you have set `kylin.stream.event.timezone`. +3. In some case, you want to add to a lot of historical data into Kylin streaming cube to analyse(not replace something), you can also use the method. + + + + +### Some screenshots + + + + diff --git a/website/images/Hive-Global-Dictionary/add-count-distinct.png b/website/images/Hive-Global-Dictionary/add-count-distinct.png new file mode 100644 index 0000000..ee270d6 Binary files /dev/null and b/website/images/Hive-Global-Dictionary/add-count-distinct.png differ diff --git a/website/images/Hive-Global-Dictionary/cube-level-config.png b/website/images/Hive-Global-Dictionary/cube-level-config.png deleted file mode 100644 index dac474e..0000000 Binary files a/website/images/Hive-Global-Dictionary/cube-level-config.png and /dev/null differ diff --git a/website/images/Hive-Global-Dictionary/hive-global-dict-table.png b/website/images/Hive-Global-Dictionary/hive-global-dict-table.png deleted file mode 100644 index bddc067..0000000 Binary files a/website/images/Hive-Global-Dictionary/hive-global-dict-table.png and /dev/null differ diff --git a/website/images/Hive-Global-Dictionary/new-added-step-1.png b/website/images/Hive-Global-Dictionary/new-added-step-1.png new file mode 100644 index 0000000..55e46bd Binary files /dev/null and b/website/images/Hive-Global-Dictionary/new-added-step-1.png differ diff --git a/website/images/Hive-Global-Dictionary/new-added-step-2.png b/website/images/Hive-Global-Dictionary/new-added-step-2.png new file mode 100644 index 0000000..265da17 Binary files /dev/null and b/website/images/Hive-Global-Dictionary/new-added-step-2.png differ diff --git a/website/images/Hive-Global-Dictionary/set-hive-dict-cloumn.png b/website/images/Hive-Global-Dictionary/set-hive-dict-cloumn.png new file mode 100644 index 0000000..f549ec6 Binary files /dev/null and b/website/images/Hive-Global-Dictionary/set-hive-dict-cloumn.png differ diff --git a/website/images/Hive-Global-Dictionary/set-hive-dict-column.png b/website/images/Hive-Global-Dictionary/set-hive-dict-column.png deleted file mode 100644 index aa7e807..0000000 Binary files a/website/images/Hive-Global-Dictionary/set-hive-dict-column.png and /dev/null differ diff --git a/website/images/Hive-Global-Dictionary/three-added-steps.png b/website/images/Hive-Global-Dictionary/three-added-steps.png deleted file mode 100644 index d541c31..0000000 Binary files a/website/images/Hive-Global-Dictionary/three-added-steps.png and /dev/null differ