http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/create_cube.cn.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/create_cube.cn.md b/website/_docs20/tutorial/create_cube.cn.md new file mode 100644 index 0000000..5c28e11 --- /dev/null +++ b/website/_docs20/tutorial/create_cube.cn.md @@ -0,0 +1,129 @@ +--- +layout: docs20-cn +title: Kylin Cube å建æç¨ +categories: æç¨ +permalink: /cn/docs20/tutorial/create_cube.html +version: v1.2 +since: v0.7.1 +--- + + +### I. æ°å»ºä¸ä¸ªé¡¹ç® +1. ç±é¡¶é¨èåæ è¿å ¥`Query`页é¢ï¼ç¶åç¹å»`Manage Projects`ã + +  + +2. ç¹å»`+ Project`æé®æ·»å ä¸ä¸ªæ°ç项ç®ã + +  + +3. å¡«åä¸å表åå¹¶ç¹å»`submit`æé®æäº¤è¯·æ±ã + +  + +4. æååï¼åºé¨ä¼æ¾ç¤ºéç¥ã + +  + +### II. 忥ä¸å¼ 表 +1. å¨é¡¶é¨èåæ ç¹å»`Tables`ï¼ç¶åç¹å»`+ Sync`æé®å è½½hiveè¡¨å æ°æ®ã + +  + +2. è¾å ¥è¡¨åå¹¶ç¹å»`Sync`æé®æäº¤è¯·æ±ã + +  + +### III. æ°å»ºä¸ä¸ªcube +é¦å ï¼å¨é¡¶é¨èåæ ç¹å»`Cubes`ãç¶åç¹å»`+Cube`æé®è¿å ¥cube designer页é¢ã + + + +**æ¥éª¤1. Cubeä¿¡æ¯** + +å¡«åcubeåºæ¬ä¿¡æ¯ãç¹å»`Next`è¿å ¥ä¸ä¸æ¥ã + +ä½ å¯ä»¥ä½¿ç¨åæ¯ãæ°ååâ_âæ¥ä¸ºä½ çcubeå½åï¼æ³¨æååä¸ä¸è½ä½¿ç¨ç©ºæ ¼ï¼ã + + + +**æ¥éª¤2. 维度** + +1. 建ç«äºå®è¡¨ã + +  + +2. ç¹å»`+Dimension`æé®æ·»å ä¸ä¸ªæ°ç维度ã + +  + +3. å¯ä»¥éæ©ä¸åç±»åç维度å å ¥ä¸ä¸ªcubeãæä»¬å¨è¿éååºå ¶ä¸ä¸é¨åä¾ä½ åèã + + * ä»äºå®è¡¨è·å维度ã +  + + * 仿¥æ¾è¡¨è·å维度ã +  + +  + + * 仿åçº§ç»æçæ¥æ¾è¡¨è·å维度ã +  + + * 仿è¡ç维度(derived dimensions)çæ¥æ¾è¡¨è·å维度ã +  + +4. ç¨æ·å¯ä»¥å¨ä¿å维度åè¿è¡ç¼è¾ã +  + +**æ¥éª¤3. 度é** + +1. ç¹å»`+Measure`æé®æ·»å ä¸ä¸ªæ°ç度éã +  + +2. æ ¹æ®å®ç表达å¼å ±æ5ç§ä¸åç±»åç度éï¼`SUM`ã`MAX`ã`MIN`ã`COUNT`å`COUNT_DISTINCT`ã请谨æ éæ©è¿åç±»åï¼å®ä¸`COUNT(DISTINCT)`ç误差çç¸å ³ã + * SUM + +  + + * MIN + +  + + * MAX + +  + + * COUNT + +  + + * DISTINCT_COUNT + +  + +**æ¥éª¤4. è¿æ»¤å¨** + +è¿ä¸æ¥éª¤æ¯å¯éçãä½ å¯ä»¥ä½¿ç¨`SQL`æ ¼å¼æ·»å ä¸äºæ¡ä»¶è¿æ»¤å¨ã + + + +**æ¥éª¤5. æ´æ°è®¾ç½®** + +è¿ä¸æ¥éª¤æ¯ä¸ºå¢éæå»ºcubeè设计çã + + + +éæ©ååºç±»åãååºååå¼å§æ¥æã + + + +**æ¥éª¤6. é«çº§è®¾ç½®** + + + +**æ¥éª¤7. æ¦è§ & ä¿å** + +ä½ å¯ä»¥æ¦è§ä½ çcubeå¹¶è¿åä¹åçæ¥éª¤è¿è¡ä¿®æ¹ãç¹å»`Save`æé®å®æcubeå建ã + +
http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/create_cube.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/create_cube.md b/website/_docs20/tutorial/create_cube.md new file mode 100644 index 0000000..ea2216b --- /dev/null +++ b/website/_docs20/tutorial/create_cube.md @@ -0,0 +1,198 @@ +--- +layout: docs20 +title: Kylin Cube Creation +categories: tutorial +permalink: /docs20/tutorial/create_cube.html +--- + +This tutorial will guide you to create a cube. It need you have at least 1 sample table in Hive. If you don't have, you can follow this to create some data. + +### I. Create a Project +1. Go to `Query` page in top menu bar, then click `Manage Projects`. + +  + +2. Click the `+ Project` button to add a new project. + +  + +3. Enter a project name, e.g, "Tutorial", with a description (optional), then click `submit` button to send the request. + +  + +4. After success, the project will show in the table. + +  + +### II. Sync up Hive Table +1. Click `Model` in top bar and then click `Data Source` tab in the left part, it lists all the tables loaded into Kylin; click `Load Hive Table` button. + +  + +2. Enter the hive table names, separated with commad, and then click `Sync` to send the request. + +  + +3. [Optional] If you want to browser the hive database to pick tables, click the `Load Hive Table From Tree` button. + +  + +4. [Optional] Expand the database node, click to select the table to load, and then click `Sync`. + +  + +5. A success message will pop up. In the left `Tables` section, the newly loaded table is added. Click the table name will expand the columns. + +  + +6. In the background, Kylin will run a MapReduce job to calculate the approximate cardinality for the newly synced table. After the job be finished, refresh web page and then click the table name, the cardinality will be shown in the table info. + +  + + +### III. Create Data Model +Before create a cube, need define a data model. The data model defines the star schema. One data model can be reused in multiple cubes. + +1. Click `Model` in top bar, and then click `Models` tab. Click `+New` button, in the drop-down list select `New Model`. + +  + +2. Enter a name for the model, with an optional description. + +  + +3. In the `Fact Table` box, select the fact table of this data model. + +  + +4. [Optional] Click `Add Lookup Table` button to add a lookup table. Select the table name and join type (inner or left). + +  + +5. [Optional] Click `New Join Condition` button, select the FK column of fact table in the left, and select the PK column of lookup table in the right side. Repeat this if have more than one join columns. + +  + +6. Click "OK", repeat step 4 and 5 to add more lookup tables if any. After finished, click "Next". + +7. The "Dimensions" page allows to select the columns that will be used as dimension in the child cubes. Click the `Columns` cell of a table, in the drop-down list select the column to the list. + +  + +8. Click "Next" go to the "Measures" page, select the columns that will be used in measure/metrics. The measure column can only from fact table. + +  + +9. Click "Next" to the "Settings" page. If the data in fact table increases by day, select the corresponding date column in the `Partition Date Column`, and select the date format, otherwise leave it as blank. + +10. [Optional] Select `Cube Size`, which is an indicator on the scale of the cube, by default it is `MEDIUM`. + +11. [Optional] If some records want to excluded from the cube, like dirty data, you can input the condition in `Filter`. + + +  + +12. Click `Save` and then select `Yes` to save the data model. After created, the data model will be shown in the left `Models` list. + +  + +### IV. Create Cube +After the data model be created, you can start to create cube. + +Click `Model` in top bar, and then click `Models` tab. Click `+New` button, in the drop-down list select `New Cube`. +  + + +**Step 1. Cube Info** + +Select the data model, enter the cube name; Click `Next` to enter the next step. + +You can use letters, numbers and '_' to name your cube (blank space in name is not allowed). `Notification List` is a list of email addresses which be notified on cube job success/failure. +  + + +**Step 2. Dimensions** + +1. Click `Add Dimension`, it popups two option: "Normal" and "Derived": "Normal" is to add a normal independent dimension column, "Derived" is to add a derived dimension column. Read more in [How to optimize cubes](/docs15/howto/howto_optimize_cubes.html). + +2. Click "Normal" and then select a dimension column, give it a meaningful name. + +  + +3. [Optional] Click "Derived" and then pickup 1 more multiple columns on lookup table, give them a meaningful name. + +  + +4. Repeate 2 and 3 to add all dimension columns; you can do this in batch for "Normal" dimension with the button `Auto Generator`. + +  + +5. Click "Next" after select all dimensions. + +**Step 3. Measures** + +1. Click the `+Measure` to add a new measure. +  + +2. There are 6 types of measure according to its expression: `SUM`, `MAX`, `MIN`, `COUNT`, `COUNT_DISTINCT` and `TOP_N`. Properly select the return type for `COUNT_DISTINCT` and `TOP_N`, as it will impact on the cube size. + * SUM + +  + + * MIN + +  + + * MAX + +  + + * COUNT + +  + + * DISTINCT_COUNT + This measure has two implementations: + a) approximate implementation with HyperLogLog, select an acceptable error rate, lower error rate will take more storage. + b) precise implementation with bitmap (see limitation in https://issues.apache.org/jira/browse/KYLIN-1186). + +  + + Pleaste note: distinct count is a very heavy data type, it is slower to build and query comparing to other measures. + + * TOP_N + Approximate TopN measure pre-calculates the top records in each dimension combination, it will provide higher performance in query time than no pre-calculation; Need specify two parameters here: the first is the column will be used as metrics for Top records (aggregated with SUM and then sorted in descending order); the second is the literal ID, represents the record like seller_id; + + Properly select the return type, depends on how many top records to inspect: top 10, top 100 or top 1000. + +  + + +**Step 4. Refresh Setting** + +This step is designed for incremental cube build. + +`Auto Merge Time Ranges (days)`: merge the small segments into medium and large segment automatically. If you don't want to auto merge, remove the default two ranges. + +`Retention Range (days)`: only keep the segment whose data is in past given days in cube, the old segment will be automatically dropped from head; 0 means not enable this feature. + +`Partition Start Date`: the start date of this cube. + + + +**Step 5. Advanced Setting** + +`Aggregation Groups`: by default Kylin put all dimensions into one aggregation group; you can create multiple aggregation groups by knowing well about your query patterns. For the concepts of "Mandatory Dimensions", "Hierarchy Dimensions" and "Joint Dimensions", read this blog: [New Aggregation Group](/blog/2016/02/18/new-aggregation-group/) + +`Rowkeys`: the rowkeys are composed by the dimension encoded values. "Dictionary" is the default encoding method; If a dimension is not fit with dictionary (e.g., cardinality > 10 million), select "false" and then enter the fixed length for that dimension, usually that is the max. length of that column; if a value is longer than that size it will be truncated. Please note, without dictionary encoding, the cube size might be much bigger. + +You can drag & drop a dimension column to adjust its position in rowkey; Put the mandantory dimension at the begining, then followed the dimensions that heavily involved in filters (where condition). Put high cardinality dimensions ahead of low cardinality dimensions. + + +**Step 6. Overview & Save** + +You can overview your cube and go back to previous step to modify it. Click the `Save` button to complete the cube creation. + + + +Cheers! now the cube is created, you can go ahead to build and play it. http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/cube_build_job.cn.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/cube_build_job.cn.md b/website/_docs20/tutorial/cube_build_job.cn.md new file mode 100644 index 0000000..a0b2a6b --- /dev/null +++ b/website/_docs20/tutorial/cube_build_job.cn.md @@ -0,0 +1,66 @@ +--- +layout: docs20-cn +title: Kylin Cube 建ç«åJobçæ§æç¨ +categories: æç¨ +permalink: /cn/docs20/tutorial/cube_build_job.html +version: v1.2 +since: v0.7.1 +--- + +### Cubeå»ºç« +é¦å ï¼ç¡®è®¤ä½ æ¥æä½ æ³è¦å»ºç«çcubeçæéã + +1. å¨`Cubes`页é¢ä¸ï¼ç¹å»cubeæ å³ä¾§ç`Action`䏿æé®å¹¶éæ©`Build`æä½ã + +  + +2. éæ©åä¼åºç°ä¸ä¸ªå¼¹åºçªå£ã + +  + +3. ç¹å»`END DATE`è¾å ¥æ¡éæ©å¢éæå»ºè¿ä¸ªcubeçç»ææ¥æã + +  + +4. ç¹å»`Submit`æäº¤è¯·æ±ã + +  + +  + + æäº¤è¯·æ±æååï¼ä½ å°ä¼çå°`Jobs`页颿°å»ºäºjobã + +  + +5. å¦è¦æ¾å¼è¿ä¸ªjobï¼ç¹å»`Discard`æé®ã + +  + +### Jobçæ§ +å¨`Jobs`页é¢ï¼ç¹å»jobè¯¦æ æé®æ¥çæ¾ç¤ºäºå³ä¾§ç详ç»ä¿¡æ¯ã + + + +job详ç»ä¿¡æ¯ä¸ºè·è¸ªä¸ä¸ªjobæä¾äºå®çæ¯ä¸æ¥è®°å½ãä½ å¯ä»¥å°å æ åæ¾å¨ä¸ä¸ªæ¥éª¤ç¶æå¾æ 䏿¥çåºæ¬ç¶æåä¿¡æ¯ã + + + +ç¹å»æ¯ä¸ªæ¥éª¤æ¾ç¤ºç徿 æé®æ¥ç详æ ï¼`Parameters`ã`Log`ã`MRJob`ã`EagleMonitoring`ã + +* Parameters + +  + +  + +* Log + +  + +  + +* MRJob(MapReduce Job) + +  + +  http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/cube_build_job.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/cube_build_job.md b/website/_docs20/tutorial/cube_build_job.md new file mode 100644 index 0000000..0810c5b --- /dev/null +++ b/website/_docs20/tutorial/cube_build_job.md @@ -0,0 +1,67 @@ +--- +layout: docs20 +title: Kylin Cube Build and Job Monitoring +categories: tutorial +permalink: /docs20/tutorial/cube_build_job.html +--- + +### Cube Build +First of all, make sure that you have authority of the cube you want to build. + +1. In `Models` page, click the `Action` drop down button in the right of a cube column and select operation `Build`. + +  + +2. There is a pop-up window after the selection, click `END DATE` input box to select end date of this incremental cube build. + +  + +4. Click `Submit` to send the build request. After success, you will see the new job in the `Monitor` page. + +  + +5. The new job is in "pending" status; after a while, it will be started to run and you will see the progress by refresh the web page or click the refresh button. + +  + + +6. Wait the job to finish. In the between if you want to discard it, click `Actions` -> `Discard` button. + +  + +7. After the job is 100% finished, the cube's status becomes to "Ready", means it is ready to serve SQL queries. In the `Model` tab, find the cube, click cube name to expand the section, in the "HBase" tab, it will list the cube segments. Each segment has a start/end time; Its underlying HBase table information is also listed. + +  + +If you have more source data, repeate the steps above to build them into the cube. + +### Job Monitoring +In the `Monitor` page, click the job detail button to see detail information show in the right side. + + + +The detail information of a job provides a step-by-step record to trace a job. You can hover a step status icon to see the basic status and information. + + + +Click the icon buttons showing in each step to see the details: `Parameters`, `Log`, `MRJob`. + +* Parameters + +  + +  + +* Log + +  + +  + +* MRJob(MapReduce Job) + +  + +  + + http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/cube_spark.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/cube_spark.md b/website/_docs20/tutorial/cube_spark.md new file mode 100644 index 0000000..5f7893a --- /dev/null +++ b/website/_docs20/tutorial/cube_spark.md @@ -0,0 +1,166 @@ +--- +layout: docs20 +title: Build Cube with Spark (beta) +categories: tutorial +permalink: /docs20/tutorial/cube_spark.html +--- +Kylin v2.0 introduces the Spark cube engine, it uses Apache Spark to replace MapReduce in the build cube step; You can check [this blog](/blog/2017/02/23/by-layer-spark-cubing/) for an overall picture. The current document uses the sample cube to demo how to try the new engine. + +## Preparation +To finish this tutorial, you need a Hadoop environment which has Kylin v2.0.0 or above installed. Here we will use Hortonworks HDP 2.4 Sandbox VM, the Hadoop components as well as Hive/HBase has already been started. + +## Install Kylin v2.0.0 beta + +Download the Kylin v2.0.0 beta for HBase 1.x from Kylin's download page, and then uncompress the tar ball into */usr/local/* folder: + +{% highlight Groff markup %} + +wget https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-2.0.0-beta/apache-kylin-2.0.0-beta-hbase1x.tar.gz -P /tmp + +tar -zxvf /tmp/apache-kylin-2.0.0-beta-hbase1x.tar.gz -C /usr/local/ + +export KYLIN_HOME=/usr/local/apache-kylin-2.0.0-SNAPSHOT-bin +{% endhighlight %} + +## Prepare "kylin.env.hadoop-conf-dir" + +To run Spark on Yarn, need specify **HADOOP_CONF_DIR** environment variable, which is the directory that contains the (client side) configuration files for Hadoop. In many Hadoop distributions the directory is "/etc/hadoop/conf"; But Kylin not only need access HDFS, Yarn and Hive, but also HBase, so the default directory might not have all necessary files. In this case, you need create a new directory and then copying or linking those client files (core-site.xml, yarn-site.xml, hive-site.xml and hbase-site.xml) there. In HDP 2.4, there is a conflict between hive-tez and Spark, so need change the default engine from "tez" to "mr" when copy for Kylin. + +{% highlight Groff markup %} + +mkdir $KYLIN_HOME/hadoop-conf +ln -s /etc/hadoop/conf/core-site.xml $KYLIN_HOME/hadoop-conf/core-site.xml +ln -s /etc/hadoop/conf/yarn-site.xml $KYLIN_HOME/hadoop-conf/yarn-site.xml +ln -s /etc/hbase/2.4.0.0-169/0/hbase-site.xml $KYLIN_HOME/hadoop-conf/hbase-site.xml +cp /etc/hive/2.4.0.0-169/0/hive-site.xml $KYLIN_HOME/hadoop-conf/hive-site.xml +vi $KYLIN_HOME/hadoop-conf/hive-site.xml (change "hive.execution.engine" value from "tez" to "mr") + +{% endhighlight %} + +Now, let Kylin know this directory with property "kylin.env.hadoop-conf-dir" in kylin.properties: + +{% highlight Groff markup %} +kylin.env.hadoop-conf-dir=/usr/local/apache-kylin-2.0.0-SNAPSHOT-bin/hadoop-conf +{% endhighlight %} + +If this property isn't set, Kylin will use the directory that "hive-site.xml" locates in; while that folder may have no "hbase-site.xml", will get HBase/ZK connection error in Spark. + +## Check Spark configuration + +Kylin embedes a Spark binary (v1.6.3) in $KYLIN_HOME/spark, all the Spark configurations can be managed in $KYLIN_HOME/conf/kylin.properties with prefix *"kylin.engine.spark-conf."*. These properties will be extracted and applied when runs submit Spark job; E.g, if you configure "kylin.engine.spark-conf.spark.executor.memory=4G", Kylin will use "--conf spark.executor.memory=4G" as parameter when execute "spark-submit". + +Before you run Spark cubing, suggest take a look on these configurations and do customization according to your cluster. Below is the default configurations, which is also the minimal config for a sandbox (1 executor with 1GB memory); usually in a normal cluster, need much more executors and each has at least 4GB memory and 2 cores: + +{% highlight Groff markup %} +kylin.engine.spark-conf.spark.master=yarn +kylin.engine.spark-conf.spark.submit.deployMode=cluster +kylin.engine.spark-conf.spark.yarn.queue=default +kylin.engine.spark-conf.spark.executor.memory=1G +kylin.engine.spark-conf.spark.executor.cores=2 +kylin.engine.spark-conf.spark.executor.instances=1 +kylin.engine.spark-conf.spark.eventLog.enabled=true +kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history +kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history +#kylin.engine.spark-conf.spark.yarn.jar=hdfs://namenode:8020/kylin/spark/spark-assembly-1.6.3-hadoop2.6.0.jar +#kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec + +## uncomment for HDP +#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current +#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current +#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current + +{% endhighlight %} + +For running on Hortonworks platform, need specify "hdp.version" as Java options for Yarn containers, so please uncommment the last three lines in kylin.properties. + +Besides, in order to avoid repeatedly uploading Spark assembly jar to Yarn, you can manually do that once, and then configure the jar's HDFS location; Please note, the HDFS location need be full qualified name. + +{% highlight Groff markup %} +hadoop fs -mkdir -p /kylin/spark/ +hadoop fs -put $KYLIN_HOME/spark/lib/spark-assembly-1.6.3-hadoop2.6.0.jar /kylin/spark/ +{% endhighlight %} + +After do that, the config in kylin.properties will be: +{% highlight Groff markup %} +kylin.engine.spark-conf.spark.yarn.jar=hdfs://sandbox.hortonworks.com:8020/kylin/spark/spark-assembly-1.6.3-hadoop2.6.0.jar +kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current +kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current +kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current +{% endhighlight %} + +All the "kylin.engine.spark-conf.*" parameters can be overwritten at Cube or Project level, this gives more flexibility to the user. + +## Create and modify sample cube + +Run the sample.sh to create the sample cube, and then start Kylin server: + +{% highlight Groff markup %} + +$KYLIN_HOME/bin/sample.sh +$KYLIN_HOME/bin/kylin.sh start + +{% endhighlight %} + +After Kylin is started, access Kylin web, edit the "kylin_sales" cube, in the "Advanced Setting" page, change the "Cube Engine" from "MapReduce" to "Spark (Beta)": + + +  + +Click "Next" to the "Configuration Overwrites" page, click "+Property" to add property "kylin.engine.spark.rdd-partition-cut-mb" with value "100" (reasons below): + +  + +The sample cube has two memory hungry measures: a "COUNT DISTINCT" and a "TOPN(100)"; Their size estimation can be inaccurate when the source data is small: the estimized size is much larger than the real size, that causes much more RDD partitions be splitted, which slows down the build. Here 100 is a more reasonable number for it. Click "Next" and "Save" to save the cube. + + +## Build Cube with Spark + +Click "Build", select current date as the build end date. Kylin generates a build job in the "Monitor" page, in which the 7th step is the Spark cubing. The job engine starts to execute the steps in sequence. + + +  + + +  + +When Kylin executes this step, you can monitor the status in Yarn resource manager. Click the "Application Master" link will open Spark web UI, it shows the progress of each stage and the detailed information. + + +  + + +  + + +After all steps be successfully executed, the Cube becomes "Ready" and you can query it as normal. + +## Troubleshooting + +When getting error, you should check "logs/kylin.log" firstly. There has the full Spark command that Kylin executes, e.g: + +{% highlight Groff markup %} +2017-03-06 14:44:38,574 INFO [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/usr/local/apache-kylin-2.0.0-SNAPSHOT-bin/hadoop-conf && /usr/local/apache-kylin-2.0.0-SNAPSHOT-bin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.instances=1 --conf spark.yarn.jar=hdfs://sandbox.hortonworks.com:8020/kylin/spark/spark-assembly-1.6.3-hadoop2.6.0.jar --conf spark.yarn.queue=default --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn --conf spark.executor.extraJavaOptions=-Dhdp.version=current --conf spark.executor.memory=1G --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs:///kylin/spark-history --conf spark.executor.cores=2 --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml --jars /usr/local/apache-kylin-2.0.0-SNAPSHOT-bin/spark/lib/spark-assembly-1.6.3-hadoop2.6.0.jar,/usr/hdp/2.4.0.0-169/hbase/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/2.4.0.0-169/hbase/lib/hbase-client-1.1.2.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hbase/lib/hbase-common-1.1.2.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hbase/lib/hbase-protocol-1.1.2.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hbase/lib/metrics-core-2.2.0.jar,/usr/hdp/2.4.0.0-169/hbase/lib/guava-12.0.1.jar, /usr/local/apache-kylin-2.0.0-SNAPSHOT-bin/lib/kylin-job-2.0.0-SNAPSHOT.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.0.0-SNAPSHOT-bin/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube + +{% endhighlight %} + +You can copy the cmd to execute manually in shell and then tunning the parameters quickly; During the execution, you can access Yarn resource manager to check more. If the job has already finished, you can check the history info in Spark history server. + +By default Kylin outputs the history to "hdfs:///kylin/spark-history", you need start Spark history server on that directory, or change to use your existing Spark history server's event directory in conf/kylin.properties with parameter "kylin.engine.spark-conf.spark.eventLog.dir" and "kylin.engine.spark-conf.spark.history.fs.logDirectory". + +The following command will start a Spark history server instance on Kylin's output directory, before run it making sure you have stopped the existing Spark history server in sandbox: + +{% highlight Groff markup %} +$KYLIN_HOME/spark/sbin/start-history-server.sh hdfs://sandbox.hortonworks.com:8020/kylin/spark-history +{% endhighlight %} + +In web browser, access "http://sandbox:18080" it shows the job history: + +  + +Click a specific job, there you will see the detail runtime information, that is very helpful for trouble shooting and performance tuning. + +## Go further + +If you're a Kylin administrator but new to Spark, suggest you go through [Spark documents](https://spark.apache.org/docs/1.6.3/), and don't forget to update the configurations accordingly. Spark's performance relies on Cluster's memory and CPU resource, while Kylin's Cube build is a heavy task when having a complex data model and a huge dataset to build at one time. If your cluster resource couldn't fulfill, errors like "OutOfMemorry" will be thrown in Spark executors, so please use it properly. For Cube which has UHC dimension, many combinations (e.g, a full cube with more than 12 dimensions), or memory hungry measures (Count Distinct, Top-N), suggest to use the MapReduce engine. If your Cube model is simple, all measures are SUM/MIN/MAX/COUNT, source data is small to medium scale, Spark engine would be a good choice. Besides, Streaming build isn't supported in this engine so far (KYLIN-2484). + +Now the Spark engine is in public beta; If you have any question, comment, or bug fix, welcome to discuss in d...@kylin.apache.org. http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/cube_streaming.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/cube_streaming.md b/website/_docs20/tutorial/cube_streaming.md new file mode 100644 index 0000000..08e5bf9 --- /dev/null +++ b/website/_docs20/tutorial/cube_streaming.md @@ -0,0 +1,219 @@ +--- +layout: docs20 +title: Scalable Cubing from Kafka (beta) +categories: tutorial +permalink: /docs20/tutorial/cube_streaming.html +--- +Kylin v1.6 releases the scalable streaming cubing function, it leverages Hadoop to consume the data from Kafka to build the cube, you can check [this blog](/blog/2016/10/18/new-nrt-streaming/) for the high level design. This doc is a step by step tutorial, illustrating how to create and build a sample cube; + +## Preparation +To finish this tutorial, you need a Hadoop environment which has kylin v1.6.0 or above installed, and also have a Kafka (v0.10.0 or above) running; Previous Kylin version has a couple issues so please upgrade your Kylin instance at first. + +In this tutorial, we will use Hortonworks HDP 2.2.4 Sandbox VM + Kafka v0.10.0(Scala 2.10) as the environment. + +## Install Kafka 0.10.0.0 and Kylin +Don't use HDP 2.2.4's build-in Kafka as it is too old, stop it first if it is running. +{% highlight Groff markup %} +curl -s http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz | tar -xz -C /usr/local/ + +cd /usr/local/kafka_2.10-0.10.0.0/ + +bin/kafka-server-start.sh config/server.properties & + +{% endhighlight %} + +Download the Kylin v1.6 from download page, expand the tar ball in /usr/local/ folder. + +## Create sample Kafka topic and populate data + +Create a sample topic "kylindemo", with 3 partitions: + +{% highlight Groff markup %} + +bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic kylindemo +Created topic "kylindemo". +{% endhighlight %} + +Put sample data to this topic; Kylin has an utility class which can do this; + +{% highlight Groff markup %} +export KAFKA_HOME=/usr/local/kafka_2.10-0.10.0.0 +export KYLIN_HOME=/usr/local/apache-kylin-1.6.0-bin + +cd $KYLIN_HOME +./bin/kylin.sh org.apache.kylin.source.kafka.util.KafkaSampleProducer --topic kylindemo --broker localhost:9092 +{% endhighlight %} + +This tool will send 100 records to Kafka every second. Please keep it running during this tutorial. You can check the sample message with kafka-console-consumer.sh now: + +{% highlight Groff markup %} +cd $KAFKA_HOME +bin/kafka-console-consumer.sh --zookeeper localhost:2181 --bootstrap-server localhost:9092 --topic kylindemo --from-beginning +{"amount":63.50375137330458,"category":"TOY","order_time":1477415932581,"device":"Other","qty":4,"user":{"id":"bf249f36-f593-4307-b156-240b3094a1c3","age":21,"gender":"Male"},"currency":"USD","country":"CHINA"} +{"amount":22.806058795736583,"category":"ELECTRONIC","order_time":1477415932591,"device":"Andriod","qty":1,"user":{"id":"00283efe-027e-4ec1-bbed-c2bbda873f1d","age":27,"gender":"Female"},"currency":"USD","country":"INDIA"} + + {% endhighlight %} + +## Define a table from streaming +Start Kylin server with "$KYLIN_HOME/bin/kylin.sh start", login Kylin Web GUI at http://sandbox:7070/kylin/, select an existing project or create a new project; Click "Model" -> "Data Source", then click the icon "Add Streaming Table"; + +  + +In the pop-up dialogue, enter a sample record which you got from the kafka-console-consumer, click the ">>" button, Kylin parses the JSON message and listS all the properties; + +You need give a logic table name for this streaming data source; The name will be used for SQL query later; here enter "STREAMING_SALES_TABLE" as an example in the "Table Name" field. + +You need select a timestamp field which will be used to identify the time of a message; Kylin can derive other time values like "year_start", "quarter_start" from this time column, which can give your more flexibility on building and querying the cube. Here check "order_time". You can deselect those properties which are not needed for cube. Here let's keep all fields. + +Notice that Kylin supports structured (or say "embedded") message from v1.6, it will convert them into a flat table structure. By default use "_" as the separator of the structed properties. + +  + + +Click "Next". On this page, provide the Kafka cluster information; Enter "kylindemo" as "Topic" name; The cluster has 1 broker, whose host name is "sandbox", port is "9092", click "Save". + +  + +In "Advanced setting" section, the "timeout" and "buffer size" are the configurations for connecting with Kafka, keep them. + +In "Parser Setting", by default Kylin assumes your message is JSON format, and each record's timestamp column (specified by "tsColName") is a bigint (epoch time) value; in this case, you just need set the "tsColumn" to "order_time"; + + + +In real case if the timestamp value is a string valued timestamp like "Jul 20, 2016 9:59:17 AM", you need specify the parser class with "tsParser" and the time pattern with "tsPattern" like this: + + + + +Click "Submit" to save the configurations. Now a "Streaming" table is created. + + + +## Define data model +With the table defined in previous step, now we can create the data model. The step is almost the same as you create a normal data model, but it has two requirement: + +* Streaming Cube doesn't support join with lookup tables; When define the data model, only select fact table, no lookup table; +* Streaming Cube must be partitioned; If you're going to build the Cube incrementally at minutes level, select "MINUTE_START" as the cube's partition date column. If at hours level, select "HOUR_START". + +Here we pick 13 dimension and 2 measure columns: + + + + +Save the data model. + +## Create Cube + +The streaming Cube is almost the same as a normal cube. a couple of points need get your attention: + +* The partition time column should be a dimension of the Cube. In Streaming OLAP the time is always a query condition, and Kylin will leverage this to narrow down the scanned partitions. +* Don't use "order\_time" as dimension as that is pretty fine-grained; suggest to use "mintue\_start", "hour\_start" or other, depends on how you will inspect the data. +* Define "year\_start", "quarter\_start", "month\_start", "day\_start", "hour\_start", "minute\_start" as a hierarchy to reduce the combinations to calculate. +* In the "refersh setting" step, create more merge ranges, like 0.5 hour, 4 hours, 1 day, and then 7 days; This will help to control the cube segment number. +* In the "rowkeys" section, drag&drop the "minute\_start" to the head position, as for streaming queries, the time condition is always appeared; putting it to head will help to narrow down the scan range. + +  + +  + +  + +  + +Save the cube. + +## Run a build + +You can trigger the build from web GUI, by clicking "Actions" -> "Build", or sending a request to Kylin RESTful API with 'curl' command: + +{% highlight Groff markup %} +curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" -d '{ "sourceOffsetStart": 0, "sourceOffsetEnd": 9223372036854775807, "buildType": "BUILD"}' http://localhost:7070/kylin/api/cubes/{your_cube_name}/build2 +{% endhighlight %} + +Please note the API endpoint is different from a normal cube (this URL end with "build2"). + +Here 0 means from the last position, and 9223372036854775807 (Long.MAX_VALUE) means to the end position on Kafka topic. If it is the first time to build (no previous segment), Kylin will seek to beginning of the topics as the start position. + +In the "Monitor" page, a new job is generated; Wait it 100% finished. + +## Click the "Insight" tab, compose a SQL to run, e.g: + + {% highlight Groff markup %} +select minute_start, count(*), sum(amount), sum(qty) from streaming_sales_table group by minute_start order by minute_start + {% endhighlight %} + +The result looks like below. + + + +## Automate the build + +Once the first build and query got successfully, you can schedule incremental builds at a certain frequency. Kylin will record the offsets of each build; when receive a build request, it will start from the last end position, and then seek the latest offsets from Kafka. With the REST API you can trigger it with any scheduler tools like Linux cron: + + {% highlight Groff markup %} +crontab -e +*/5 * * * * curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" -d '{ "sourceOffsetStart": 0, "sourceOffsetEnd": 9223372036854775807, "buildType": "BUILD"}' http://localhost:7070/kylin/api/cubes/{your_cube_name}/build2 + {% endhighlight %} + +Now you can site down and watch the cube be automatically built from streaming. And when the cube segments accumulate to bigger time range, Kylin will automatically merge them into a bigger segment. + +## Trouble shootings + + * You may encounter the following error when run "kylin.sh": +{% highlight Groff markup %} +Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/clients/producer/Producer + at java.lang.Class.getDeclaredMethods0(Native Method) + at java.lang.Class.privateGetDeclaredMethods(Class.java:2615) + at java.lang.Class.getMethod0(Class.java:2856) + at java.lang.Class.getMethod(Class.java:1668) + at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) + at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) +Caused by: java.lang.ClassNotFoundException: org.apache.kafka.clients.producer.Producer + at java.net.URLClassLoader$1.run(URLClassLoader.java:366) + at java.net.URLClassLoader$1.run(URLClassLoader.java:355) + at java.security.AccessController.doPrivileged(Native Method) + at java.net.URLClassLoader.findClass(URLClassLoader.java:354) + at java.lang.ClassLoader.loadClass(ClassLoader.java:425) + at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) + at java.lang.ClassLoader.loadClass(ClassLoader.java:358) + ... 6 more +{% endhighlight %} + +The reason is Kylin wasn't able to find the proper Kafka client jars; Make sure you have properly set "KAFKA_HOME" environment variable. + + * Get "killed by admin" error in the "Build Cube" step + + Within a Sandbox VM, YARN may not allocate the requested memory resource to MR job as the "inmem" cubing algorithm requests more memory. You can bypass this by requesting less memory: edit "conf/kylin_job_conf_inmem.xml", change the following two parameters like this: + + {% highlight Groff markup %} + <property> + <name>mapreduce.map.memory.mb</name> + <value>1072</value> + <description></description> + </property> + + <property> + <name>mapreduce.map.java.opts</name> + <value>-Xmx800m</value> + <description></description> + </property> + {% endhighlight %} + + * If there already be bunch of history messages in Kafka and you don't want to build from the very beginning, you can trigger a call to set the current end position as the start for the cube: + +{% highlight Groff markup %} +curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" -d '{ "sourceOffsetStart": 0, "sourceOffsetEnd": 9223372036854775807, "buildType": "BUILD"}' http://localhost:7070/kylin/api/cubes/{your_cube_name}/init_start_offsets +{% endhighlight %} + + * If some build job got error and you discard it, there will be a hole (or say gap) left in the Cube. Since each time Kylin will build from last position, you couldn't expect the hole be filled by normal builds. Kylin provides API to check and fill the holes + +Check holes: + {% highlight Groff markup %} +curl -X GET --user ADMINN:KYLIN -H "Content-Type: application/json;charset=utf-8" http://localhost:7070/kylin/api/cubes/{your_cube_name}/holes +{% endhighlight %} + +If the result is an empty arrary, means there is no hole; Otherwise, trigger Kylin to fill them: + {% highlight Groff markup %} +curl -X PUT --user ADMINN:KYLIN -H "Content-Type: application/json;charset=utf-8" http://localhost:7070/kylin/api/cubes/{your_cube_name}/holes +{% endhighlight %} + http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/flink.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/flink.md b/website/_docs20/tutorial/flink.md new file mode 100644 index 0000000..d74f602 --- /dev/null +++ b/website/_docs20/tutorial/flink.md @@ -0,0 +1,249 @@ +--- +layout: docs20 +title: Connect from Apache Flink +categories: tutorial +permalink: /docs20/tutorial/flink.html +--- + + +### Introduction + +This document describes how to use Kylin as a data source in Apache Flink; + +There were several attempts to do this in Scala and JDBC, but none of them works: + +* [attempt1](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/JDBCInputFormat-preparation-with-Flink-1-1-SNAPSHOT-and-Scala-2-11-td5371.html) +* [attempt2](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Type-of-TypeVariable-OT-in-class-org-apache-flink-api-common-io-RichInputFormat-could-not-be-determi-td7287.html) +* [attempt3](http://stackoverflow.com/questions/36067881/create-dataset-from-jdbc-source-in-flink-using-scala) +* [attempt4](https://codegists.com/snippet/scala/jdbcissuescala_zeitgeist_scala); + +We will try use CreateInput and [JDBCInputFormat](https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html) in batch mode and access via JDBC to Kylin. But it isnât implemented in Scala, is only in Java [MailList](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/jdbc-JDBCInputFormat-td9393.html). This doc will go step by step solving these problems. + +### Pre-requisites + +* Need an instance of Kylin, with a Cube; [Sample Cube](kylin_sample.html) will be good enough. +* [Scala](http://www.scala-lang.org/) and [Apache Flink](http://flink.apache.org/) Installed +* [IntelliJ](https://www.jetbrains.com/idea/) Installed and configured for Scala/Flink (see [Flink IDE setup guide](https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/ide_setup.html) ) + +### Used software: + +* [Apache Flink](http://flink.apache.org/downloads.html) v1.2-SNAPSHOT +* [Apache Kylin](http://kylin.apache.org/download/) v1.5.2 (v1.6.0 also works) +* [IntelliJ](https://www.jetbrains.com/idea/download/#section=linux) v2016.2 +* [Scala](downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz) v2.11 + +### Starting point: + +This can be out initial skeleton: + +{% highlight Groff markup %} +import org.apache.flink.api.scala._ +val env = ExecutionEnvironment.getExecutionEnvironment +val inputFormat = JDBCInputFormat.buildJDBCInputFormat() + .setDrivername("org.apache.kylin.jdbc.Driver") + .setDBUrl("jdbc:kylin://172.17.0.2:7070/learn_kylin") + .setUsername("ADMIN") + .setPassword("KYLIN") + .setQuery("select count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt") + .finish() + val dataset =env.createInput(inputFormat) +{% endhighlight %} + +The first error is:  + +Add to Scala: +{% highlight Groff markup %} +import org.apache.flink.api.java.io.jdbc.JDBCInputFormat +{% endhighlight %} + +Next error is  + +We can solve dependencies [(mvn repository: jdbc)](https://mvnrepository.com/artifact/org.apache.flink/flink-jdbc/1.1.2); Add this to your pom.xml: +{% highlight Groff markup %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-jdbc</artifactId> + <version>${flink.version}</version> +</dependency> +{% endhighlight %} + +## Solve dependencies of row + +Similar to previous point we need solve dependencies of Row Class [(mvn repository: Table) ](https://mvnrepository.com/artifact/org.apache.flink/flink-table_2.10/1.1.2): + +  + + +* In pom.xml add: +{% highlight Groff markup %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-table_2.10</artifactId> + <version>${flink.version}</version> +</dependency> +{% endhighlight %} + +* In Scala: +{% highlight Groff markup %} +import org.apache.flink.api.table.Row +{% endhighlight %} + +## Solve RowTypeInfo property (and their new dependencies) + +This is the new error to solve: + +  + + +* If check the code of [JDBCInputFormat.java](https://github.com/apache/flink/blob/master/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java#L69), we can see [this new property](https://github.com/apache/flink/commit/09b428bd65819b946cf82ab1fdee305eb5a941f5#diff-9b49a5041d50d9f9fad3f8060b3d1310R69) (and mandatory) added on Apr 2016 by [FLINK-3750](https://issues.apache.org/jira/browse/FLINK-3750) Manual [JDBCInputFormat](https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.html) v1.2 in Java + + Add the new Property: **setRowTypeInfo** + +{% highlight Groff markup %} +val inputFormat = JDBCInputFormat.buildJDBCInputFormat() + .setDrivername("org.apache.kylin.jdbc.Driver") + .setDBUrl("jdbc:kylin://172.17.0.2:7070/learn_kylin") + .setUsername("ADMIN") + .setPassword("KYLIN") + .setQuery("select count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt") + .setRowTypeInfo(DB_ROWTYPE) + .finish() +{% endhighlight %} + +* How can configure this property in Scala? In [Attempt4](https://codegists.com/snippet/scala/jdbcissuescala_zeitgeist_scala), there is an incorrect solution + + We can check the types using the intellisense:  + + Then we will need add more dependences; Add to scala: + +{% highlight Groff markup %} +import org.apache.flink.api.table.typeutils.RowTypeInfo +import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation} +{% endhighlight %} + + Create a Array or Seq of TypeInformation[ ] + +  + + + Solution: + +{% highlight Groff markup %} + var stringColum: TypeInformation[String] = createTypeInformation[String] + val DB_ROWTYPE = new RowTypeInfo(Seq(stringColum)) +{% endhighlight %} + +## Solve ClassNotFoundException + +  + +Need find the kylin-jdbc-x.x.x.jar and then expose to Flink + +1. Find the Kylin JDBC jar + + From Kylin [Download](http://kylin.apache.org/download/) choose **Binary** and the **correct version of Kylin and HBase** + + Download & Unpack: in ./lib: + +  + + +2. Make this JAR accessible to Flink + + If you execute like service you need put this JAR in you Java class path using your .bashrc + +  + + + Check the actual value:  + + Check the permission for this file (Must be accessible for you): + +  + + + If you are executing from IDE, need add your class path manually: + + On IntelliJ:  >  >  >  + + The result, will be similar to:  + +## Solve "Couldnât access resultSet" error + +  + + +It is related with [Flink 4108](https://issues.apache.org/jira/browse/FLINK-4108) [(MailList)](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/jdbc-JDBCInputFormat-td9393.html#a9415) and Timo Walther [make a PR](https://github.com/apache/flink/pull/2619) + +If you are running Flink <= 1.2 you will need apply this path and make clean install + +## Solve the casting error + +  + +In the error message you have the problem and solution â¦. nice ;) ¡¡ + +## The result + +The output must be similar to this, print the result of query by standard output: + +  + + +## Now, more complex + +Try with a multi-colum and multi-type query: + +{% highlight Groff markup %} +select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers +from kylin_sales +group by part_dt +order by part_dt +{% endhighlight %} + +Need changes in DB_ROWTYPE: + +  + + +And import lib of Java, to work with Data type of Java  + +The new result will be: + +  + + +## Error: Reused Connection + + +  + +Check if your HBase and Kylin is working. Also you can use Kylin UI for it. + + +## Error: java.lang.AbstractMethodError: â¦.Avatica Connection + +See [Kylin 1898](https://issues.apache.org/jira/browse/KYLIN-1898) + +It is a problem with kylin-jdbc-1.x.x. JAR, you need use Calcite 1.8 or above; The solution is to use Kylin 1.5.4 or above. + +  + + + +## Error: can't expand macros compiled by previous versions of scala + +Is a problem with versions of scala, check in with "scala -version" your actual version and choose your correct POM. + +Perhaps you will need a IntelliJ > File > Invalidates Cache > Invalidate and Restart. + +I added POM for Scala 2.11 + + +## Final Words + +Now you can read Kylinâs data from Apache Flink, great! + +[Full Code Example](https://github.com/albertoRamon/Flink/tree/master/ReadKylinFromFlink/flink-scala-project) + +Solved all integration problems, and tested with different types of data (Long, BigDecimal and Dates). The patch has been comited at 15 Oct, then, will be part of Flink 1.2. http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/kylin_client_tool.cn.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/kylin_client_tool.cn.md b/website/_docs20/tutorial/kylin_client_tool.cn.md new file mode 100644 index 0000000..7100b19 --- /dev/null +++ b/website/_docs20/tutorial/kylin_client_tool.cn.md @@ -0,0 +1,97 @@ +--- +layout: docs20-cn +title: Kylin Client Tool ä½¿ç¨æç¨ +categories: æç¨ +permalink: /cn/docs20/tutorial/kylin_client_tool.html +--- + +> Kylin-client-toolæ¯ä¸ä¸ªç¨pythonç¼åçï¼å®å ¨åºäºkylinçrest apiçå·¥å ·ãå¯ä»¥å®ç°kylinçcubeåå»ºï¼ææ¶build cubeï¼jobçæäº¤ãè°åº¦ãæ¥çã忶䏿¢å¤ã + +## å®è£ +1.确认è¿è¡ç¯å¢å®è£ äºpython2.6/2.7 + +2.æ¬å·¥å ·éå®è£ ç¬¬ä¸æ¹pythonå apschedulerårequestsï¼è¿è¡setup.shè¿è¡å®è£ ï¼macç¨æ·è¿è¡setup-mac.shè¿è¡å®è£ ãä¹å¯ç¨setuptoolsè¿è¡å®è£ + +## é ç½® +ä¿®æ¹å·¥å ·ç®å½ä¸çsettings/settings.pyæä»¶ï¼è¿è¡é ç½® + +`KYLIN_USER` Kylinç¨æ·å + +`KYLIN_PASSWORD` Kylinçå¯ç + +`KYLIN_REST_HOST` Kylinçå°å + +`KYLIN_REST_PORT` Kylinçç«¯å£ + +`KYLIN_JOB_MAX_COCURRENT` å è®¸åæ¶buildçjobæ°é + +`KYLIN_JOB_MAX_RETRY` cube buildåºç°erroråï¼å 许çéå¯jobæ¬¡æ° + +## å½ä»¤è¡çä½¿ç¨ +æ¬å·¥å ·ä½¿ç¨optparseéè¿å½ä»¤è¡æ¥æ§è¡æä½ï¼å ·ä½ç¨æ³å¯éè¿`python kylin_client_tool.py ï¼h`æ¥æ¥ç + +## cubeçå建 +æ¬å·¥å ·å®ä¹äºä¸ç§è¯»æåçææ¬ï¼æ¥å¿«écubeåå»ºçæ¹æ³ï¼æ ¼å¼å¦ä¸ + +`cubeå|fact tableå|维度1,维度1ç±»å;维度2,维度2ç±»å...|ææ 1,ææ 1表达å¼,ææ 1ç±»å...|设置项|filter|` + +è®¾ç½®é¡¹å æä»¥ä¸éé¡¹ï¼ + +`no_dictionary` 设置Rowkeysä¸ä¸çædictionaryç维度åå ¶é¿åº¦ + +`mandatory_dimension` 设置Rowkeysä¸mandatoryç维度 + +`aggregation_group` 设置aggregation group + +`partition_date_column` 设置partition date column + +`partition_date_start` 设置partition start date + +å ·ä½ä¾åå¯ä»¥æ¥çcube_def.csvæä»¶ï¼ç®å䏿¯æå«lookup tableçcubeå建 + +使ç¨`-c`å½ä»¤è¿è¡å建ï¼ç¨`-F`æå®cubeå®ä¹æä»¶ï¼ä¾å¦ + +`python kylin_client_tool.py -c -F cube_def.csv` + +## build cube +###使ç¨cubeå®ä¹æä»¶build +使ç¨`-b`å½ä»¤ï¼éè¦ç¨`-F`æå®cubeå®ä¹æä»¶ï¼å¦ææå®äºpartition date columnï¼éè¿`-T`æå®end date(year-month-dayæ ¼å¼)ï¼è¥ä¸æå®ï¼ä»¥å½åæ¶é´ä¸ºend dateï¼ä¾å¦ + +`python kylin_client_tool.py -b -F cube_def.csv -T 2016-03-01` + +###使ç¨cubeåæä»¶build +ç¨`-f`æå®cubeåæä»¶ï¼æä»¶æ¯è¡ä¸ä¸ªcubeå + +`python kylin_client_tool.py -b -f cube_names.csv -T 2016-03-01` + +###ç´æ¥å½ä»¤è¡åcubeåbuild +ç¨`-C`æå®cubeåï¼éè¿éå·è¿è¡åé + +`python kylin_client_tool.py -b -C client_tool_test1,client_tool_test2 -T 2016-03-01` + +## job管ç +###æ¥çjobç¶æ +使ç¨`-s`å½ä»¤æ¥çï¼ç¨`-f`æå®cubeåæä»¶ï¼ç¨`-C`æå®cubeåï¼è¥ä¸æå®ï¼å°æ¥çææcubeç¶æãç¨`-S`æå®jobç¶æï¼R表示`Running`ï¼E表示`Error`ï¼F表示`Finished`ï¼D表示`Discarded`ï¼ä¾å¦ï¼ + +`python kylin_client_tool.py -s -C kylin_sales_cube -f cube_names.csv -S F` + +###æ¢å¤job +ç¨`-r`å½ä»¤æ¢å¤jobï¼ç¨`-f`æå®cubeåæä»¶ï¼ç¨`-C`æå®cubeåï¼è¥ä¸æå®ï¼å°æ¢å¤ææErrorç¶æçjobï¼ä¾å¦ï¼ + +`python kylin_client_tool.py -r -C kylin_sales_cube -f cube_names.csv` + +###åæ¶job +ç¨`-k`å½ä»¤åæ¶jobï¼ç¨`-f`æå®cubeåæä»¶ï¼ç¨`-C`æå®cubeåï¼è¥ä¸æå®ï¼å°åæ¶ææRunningæErrorç¶æçjobï¼ä¾å¦ï¼ + +`python kylin_client_tool.py -k -C kylin_sales_cube -f cube_names.csv` + +## 宿¶build cube +### æ¯é䏿®µæ¶é´build cube +å¨cube buildå½ä»¤çåºç¡ä¸ï¼ä½¿ç¨`-B i`æå®æ¯é䏿®µæ¶é´buildçæ¹å¼ï¼ä½¿ç¨`-O`æå®é´éçå°æ¶æ°ï¼ä¾å¦ï¼ + +`python kylin_client_tool.py -b -F cube_def.csv -B i -O 1` + +### è®¾å®æ¶é´build cube +使ç¨`-B t`æå®ææ¶build cubeçæ¹å¼ï¼ä½¿ç¨`-O`æå®buildæ¶é´ï¼ç¨éå·è¿è¡åé + +`python kylin_client_tool.py -b -F cube_def.csv -T 2016-03-04 -B t -O 2016,3,1,0,0,0` http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/kylin_sample.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/kylin_sample.md b/website/_docs20/tutorial/kylin_sample.md new file mode 100644 index 0000000..d083f10 --- /dev/null +++ b/website/_docs20/tutorial/kylin_sample.md @@ -0,0 +1,21 @@ +--- +layout: docs20 +title: Quick Start with Sample Cube +categories: tutorial +permalink: /docs20/tutorial/kylin_sample.html +--- + +Kylin provides a script for you to create a sample Cube; the script will also create three sample hive tables: + +1. Run ${KYLIN_HOME}/bin/sample.sh ; Restart kylin server to flush the caches; +2. Logon Kylin web with default user ADMIN/KYLIN, select project "learn_kylin" in the project dropdown list (left upper corner); +3. Select the sample cube "kylin_sales_cube", click "Actions" -> "Build", pick up a date later than 2014-01-01 (to cover all 10000 sample records); +4. Check the build progress in "Monitor" tab, until 100%; +5. Execute SQLs in the "Insight" tab, for example: + select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt +6. You can verify the query result and compare the response time with hive; + + +## What's next + +You can create another cube with the sample tables, by following the tutorials. http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/odbc.cn.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/odbc.cn.md b/website/_docs20/tutorial/odbc.cn.md new file mode 100644 index 0000000..665b824 --- /dev/null +++ b/website/_docs20/tutorial/odbc.cn.md @@ -0,0 +1,34 @@ +--- +layout: docs20-cn +title: Kylin ODBC 驱å¨ç¨åºæç¨ +categories: æç¨ +permalink: /cn/docs20/tutorial/odbc.html +version: v1.2 +since: v0.7.1 +--- + +> æä»¬æä¾Kylin ODBC驱å¨ç¨åºä»¥æ¯æODBCå ¼å®¹å®¢æ·ç«¯åºç¨çæ°æ®è®¿é®ã +> +> 32ä½çæ¬æ64ä½çæ¬ç驱å¨ç¨åºé½æ¯å¯ç¨çã +> +> æµè¯æä½ç³»ç»ï¼Windows 7ï¼Windows Server 2008 R2 +> +> æµè¯åºç¨ï¼Tableau 8.0.4 å Tableau 8.1.3 + +## åææ¡ä»¶ +1. Microsoft Visual C++ 2012 ååé ï¼Redistributableï¼ + * 32ä½Windowsæ32ä½Tableau Desktopï¼ä¸è½½ï¼[32bit version](http://download.microsoft.com/download/1/6/B/16B06F60-3B20-4FF2-B699-5E9B7962F9AE/VSU_4/vcredist_x86.exe) + * 64ä½Windowsæ64ä½Tableau Desktopï¼ä¸è½½ï¼[64bit version](http://download.microsoft.com/download/1/6/B/16B06F60-3B20-4FF2-B699-5E9B7962F9AE/VSU_4/vcredist_x64.exe) + +2. ODBC驱å¨ç¨åºå é¨ä»ä¸ä¸ªRESTæå¡å¨è·åç»æï¼ç¡®ä¿ä½ è½å¤è®¿é®ä¸ä¸ª + +## å®è£ +1. å¦æä½ å·²ç»å®è£ ï¼é¦å å¸è½½å·²åå¨çKylin ODBC +2. ä»[ä¸è½½](../../download/)ä¸è½½é件驱å¨å®è£ ç¨åºï¼å¹¶è¿è¡ã + * 32ä½Tableau Desktopï¼è¯·å®è£ KylinODBCDriver (x86).exe + * 64ä½Tableau Desktopï¼è¯·å®è£ KylinODBCDriver (x64).exe + +3. Both drivers already be installed on Tableau Server, you properly should be able to publish to there without issues + +## é误æ¥å +妿é®é¢ï¼è¯·æ¥åé误è³Apache Kylin JIRAï¼æè åéé®ä»¶å°devé®ä»¶å表ã http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/odbc.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/odbc.md b/website/_docs20/tutorial/odbc.md new file mode 100644 index 0000000..f386fd6 --- /dev/null +++ b/website/_docs20/tutorial/odbc.md @@ -0,0 +1,49 @@ +--- +layout: docs20 +title: Kylin ODBC Driver +categories: tutorial +permalink: /docs20/tutorial/odbc.html +since: v0.7.1 +--- + +> We provide Kylin ODBC driver to enable data access from ODBC-compatible client applications. +> +> Both 32-bit version or 64-bit version driver are available. +> +> Tested Operation System: Windows 7, Windows Server 2008 R2 +> +> Tested Application: Tableau 8.0.4, Tableau 8.1.3 and Tableau 9.1 + +## Prerequisites +1. Microsoft Visual C++ 2012 Redistributable + * For 32 bit Windows or 32 bit Tableau Desktop: Download: [32bit version](http://download.microsoft.com/download/1/6/B/16B06F60-3B20-4FF2-B699-5E9B7962F9AE/VSU_4/vcredist_x86.exe) + * For 64 bit Windows or 64 bit Tableau Desktop: Download: [64bit version](http://download.microsoft.com/download/1/6/B/16B06F60-3B20-4FF2-B699-5E9B7962F9AE/VSU_4/vcredist_x64.exe) + + +2. ODBC driver internally gets results from a REST server, make sure you have access to one + +## Installation +1. Uninstall existing Kylin ODBC first, if you already installled it before +2. Download ODBC Driver from [download](../../download/). + * For 32 bit Tableau Desktop: Please install KylinODBCDriver (x86).exe + * For 64 bit Tableau Desktop: Please install KylinODBCDriver (x64).exe + +3. Both drivers already be installed on Tableau Server, you properly should be able to publish to there without issues + +## DSN configuration +1. Open ODBCAD to configure DSN. + * For 32 bit driver, please use the 32bit version in C:\Windows\SysWOW64\odbcad32.exe + * For 64 bit driver, please use the default "Data Sources (ODBC)" in Control Panel/Administrator Tools + + +2. Open "System DSN" tab, and click "Add", you will see KylinODBCDriver listed as an option, Click "Finish" to continue. + + +3. In the pop up dialog, fill in all the blanks, The server host is where your Kylin Rest Server is started. + + +4. Click "Done", and you will see your new DSN listed in the "System Data Sources", you can use this DSN afterwards. + + +## Bug Report +Please open Apache Kylin JIRA to report bug, or send to dev mailing list. http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/powerbi.cn.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/powerbi.cn.md b/website/_docs20/tutorial/powerbi.cn.md new file mode 100644 index 0000000..9326a82 --- /dev/null +++ b/website/_docs20/tutorial/powerbi.cn.md @@ -0,0 +1,56 @@ +--- +layout: docs20-cn +title: 微软ExcelåPower BIæç¨ +categories: tutorial +permalink: /cn/docs20/tutorial/powerbi.html +version: v1.2 +since: v1.2 +--- + +Microsoft Excelæ¯å½ä»Windowså¹³å°ä¸ææµè¡çæ°æ®å¤ç软件ä¹ä¸ï¼æ¯æå¤ç§æ°æ®å¤çåè½ï¼å¯ä»¥å©ç¨Power Queryä»ODBCæ°æ®æºè¯»åæ°æ®å¹¶è¿åå°æ°æ®è¡¨ä¸ã + +Microsoft Power BI æ¯ç±å¾®è½¯æ¨åºçå䏿ºè½çä¸ä¸åæå·¥å ·ï¼ç»ç¨æ·æä¾ç®åä¸ä¸°å¯çæ°æ®å¯è§åååæåè½ã + +> Apache Kylinç®åçæ¬ä¸æ¯æåå§æ°æ®çæ¥è¯¢ï¼é¨åæ¥è¯¢ä¼å æ¤å¤±è´¥ï¼å¯¼è´åºç¨ç¨åºåçå¼å¸¸ï¼å»ºè®®æä¸KYLIN-1075è¡¥ä¸å 以ä¼åæ¥è¯¢ç»æçæ¾ç¤ºã + + +> Power BIåExcel䏿¯æ"connect live"模å¼ï¼è¯·æ³¨æå¹¶æ·»å whereæ¡ä»¶å¨æ¥è¯¢è¶ å¤§æ°æ®éæ¶åï¼ä»¥é¿å 仿å¡å¨æå»è¿å¤çæ°æ®å°æ¬å°ï¼çè³å¨æäºæ åµä¸æ¥è¯¢æ§è¡å¤±è´¥ã + +### Install ODBC Driver +åè页é¢[Kylin ODBC 驱å¨ç¨åºæç¨](./odbc.html)ï¼è¯·ç¡®ä¿ä¸è½½å¹¶å®è£ Kylin ODBC Driver __v1.2__. å¦æä½ å®è£ ææ©åçæ¬ï¼è¯·å¸è½½ååå®è£ ã + +### è¿æ¥Excelå°Kylin +1. ä»å¾®è½¯å®ç½ä¸è½½åå®è£ Power Queryï¼å®è£ 宿åå¨Excelä¸ä¼çå°Power QueryçFast Tabï¼åå»ï½From other sourcesï½ä¸ææé®ï¼å¹¶éæ©ï½From ODBCï½é¡¹ + + +2. å¨å¼¹åºç`From ODBC`æ°æ®è¿æ¥å导ä¸è¾å ¥Apache Kylinæå¡å¨çè¿æ¥å符串ï¼ä¹å¯ä»¥å¨ï½SQLï½ææ¬æ¡ä¸è¾å ¥æ¨æ³è¦æ§è¡çSQLè¯å¥ï¼åå»ï½OKï½ï¼SQLçæ§è¡ç»æå°±ä¼ç«å³å è½½å°Excelçæ°æ®è¡¨ä¸ + + +> 为äºç®åè¿æ¥å符串çè¾å ¥ï¼æ¨èå建Apache KylinçDSNï¼å¯ä»¥å°è¿æ¥å符串ç®å为DSN=[YOUR_DSN_NAME]ï¼æå ³DSNçå建请åèï¼[https://support.microsoft.com/en-us/kb/305599](https://support.microsoft.com/en-us/kb/305599)ã + + +3. 妿æ¨éæ©ä¸è¾å ¥SQLè¯å¥ï¼Power Queryå°ä¼ååºææçæ°æ®åºè¡¨ï¼æ¨å¯ä»¥æ ¹æ®éè¦å¯¹æ´å¼ è¡¨çæ°æ®è¿è¡å è½½ã使¯ï¼Apache Kylinæä¸æ¯æåæ°æ®çæ¥è¯¢ï¼é¨å表çå è½½å¯è½å æ¤åé + + +4. ç¨ççå»ï¼æ°æ®å·²æåå è½½å°Excelä¸ + + +5. 䏿¦æå¡å¨ç«¯æ°æ®äº§çæ´æ°ï¼åéè¦å¯¹Excelä¸çæ°æ®è¿è¡åæ¥ï¼å³é®åå»å³ä¾§å表ä¸çæ°æ®æºï¼éæ©ï½Refreshï½ï¼ææ°çæ°æ®ä¾¿ä¼æ´æ°å°æ°æ®è¡¨ä¸. + +6. 1. ä¸ºäºæåæ§è½ï¼å¯ä»¥å¨Power Query䏿å¼ï½Query Optionsï½è®¾ç½®ï¼ç¶åå¼å¯ï½Fast data loadï½ï¼è¿å°æé«æ°æ®å è½½é度ï¼ä½å¯è½é æçé¢çææ¶æ ååº + +### Power BI +1. å¯å¨æ¨å·²ç»å®è£ çPower BIæ¡é¢çç¨åºï¼åå»ï½Get dataï½æé®ï¼å¹¶éä¸ODBCæ°æ®æº. + + +2. å¨å¼¹åºç`From ODBC`æ°æ®è¿æ¥å导ä¸è¾å ¥Apache Kylinæå¡å¨çæ°æ®åºè¿æ¥å符串ï¼ä¹å¯ä»¥å¨ï½SQLï½ææ¬æ¡ä¸è¾å ¥æ¨æ³è¦æ§è¡çSQLè¯å¥ãåå»ï½OKï½ï¼SQLçæ§è¡ç»æå°±ä¼ç«å³å è½½å°Power BIä¸ + + +3. 妿æ¨éæ©ä¸è¾å ¥SQLè¯å¥ï¼Power BIå°ä¼ååºé¡¹ç®ä¸ææçè¡¨ï¼æ¨å¯ä»¥æ ¹æ®éè¦å°æ´å¼ è¡¨çæ°æ®è¿è¡å è½½ã使¯ï¼Apache Kylinæä¸æ¯æåæ°æ®çæ¥è¯¢ï¼é¨å表çå è½½å¯è½å æ¤åé + + +4. ç°å¨ä½ å¯ä»¥è¿ä¸æ¥ä½¿ç¨Power BIè¿è¡å¯è§ååæï¼ + + +5. åå»å·¥å ·æ çï½Refreshï½æé®å³å¯éæ°å è½½æ°æ®å¹¶å¯¹å¾è¡¨è¿è¡æ´æ° + http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/powerbi.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/powerbi.md b/website/_docs20/tutorial/powerbi.md new file mode 100644 index 0000000..5465c57 --- /dev/null +++ b/website/_docs20/tutorial/powerbi.md @@ -0,0 +1,54 @@ +--- +layout: docs20 +title: MS Excel and Power BI +categories: tutorial +permalink: /docs20/tutorial/powerbi.html +since: v1.2 +--- + +Microsoft Excel is one of the most famous data tool on Windows platform, and has plenty of data analyzing functions. With Power Query installed as plug-in, excel can easily read data from ODBC data source and fill spreadsheets. + +Microsoft Power BI is a business intelligence tool providing rich functionality and experience for data visualization and processing to user. + +> Apache Kylin currently doesn't support query on raw data yet, some queries might fail and cause some exceptions in application. Patch KYLIN-1075 is recommended to get better look of query result. + +> Power BI and Excel do not support "connect live" model for other ODBC driver yet, please pay attention when you query on huge dataset, it may pull too many data into your client which will take a while even fail at the end. + +### Install ODBC Driver +Refer to this guide: [Kylin ODBC Driver Tutorial](./odbc.html). +Please make sure to download and install Kylin ODBC Driver __v1.2__. If you already installed ODBC Driver in your system, please uninstall it first. + +### Kylin and Excel +1. Download Power Query from Microsoftâs Website and install it. Then run Excel, switch to `Power Query` fast tab, click `From Other Sources` dropdown list, and select `ODBC` item. + + +2. Youâll see `From ODBC` dialog, just type Database Connection String of Apache Kylin Server in the `Connection String` textbox. Optionally you can type a SQL statement in `SQL statement` textbox. Click `OK`, result set will run to your spreadsheet now. + + +> Tips: In order to simplify the Database Connection String, DSN is recommended, which can shorten the Connection String like `DSN=[YOUR_DSN_NAME]`. Details about DSN, refer to [https://support.microsoft.com/en-us/kb/305599](https://support.microsoft.com/en-us/kb/305599). + +3. If you didnât input the SQL statement in last step, Power Query will list all tables in the project, which means you can load data from the whole table. But, since Apache Kylin cannot query on raw data currently, this function may be limited. + + +4. Hold on for a while, the data is lying in Excel now. + + +5. If you want to sync data with Kylin Server, just right click the data source in right panel, and select `Refresh`, then youâll see the latest data. + +6. To improve data loading performance, you can enable `Fast data load` in Power Query, but this will make your UI unresponsive for a while. + +### Power BI +1. Run Power BI Desktop, and click `Get Data` button, then select `ODBC` as data source type. + + +2. Same with Excel, just type Database Connection String of Apache Kylin Server in the `Connection String` textbox, and optionally type a SQL statement in `SQL statement` textbox. Click `OK`, the result set will come to Power BI as a new data source query. + + +3. If you didnât input the SQL statement in last step, Power BI will list all tables in the project, which means you can load data from the whole table. But, since Apache Kylin cannot query on raw data currently, this function may be limited. + + +4. Now you can start to enjoy analyzing with Power BI. + + +5. To reload the data and redraw the charts, just click `Refresh` button in `Home` fast tab. + http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/squirrel.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/squirrel.md b/website/_docs20/tutorial/squirrel.md new file mode 100644 index 0000000..7d0c9d9 --- /dev/null +++ b/website/_docs20/tutorial/squirrel.md @@ -0,0 +1,112 @@ +--- +layout: docs20 +title: Connect from SQuirreL +categories: tutorial +permalink: /docs20/tutorial/squirrel.html +--- + +### Introduction + +[SQuirreL SQL](http://www.squirrelsql.org/) is a multi platform Universal SQL Client (GNU License). You can use it to access HBase + Phoenix and Hive. This document introduces how to connect to Kylin from SQuirreL. + +### Used Software + +* [Kylin v1.6.0](/download/) & ODBC 1.6 +* [SquirreL SQL v3.7.1](http://www.squirrelsql.org/) + +## Pre-requisites + +* Find the Kylin JDBC driver jar + From Kylin Download, Choose Binary and the **correct version of Kylin and HBase** + Download & Unpack: in **./lib**: +  + + +* Need an instance of Kylin, with a Cube; the [Sample Cube](kylin_sample.html) is enough. + +  + + +* [Dowload and install SquirreL](http://www.squirrelsql.org/#installation) + +## Add Kylin JDBC Driver + +On left menu:  > >  >  + +And locate the JAR:  + +Configure this parameters: + +* Put a name:  +* Example URL  + + jdbc:kylin://172.17.0.2:7070/learn_kylin +* Put Class Name:  + Tip: If auto complete not work, type: org.apache.kylin.jdbc.Driver + +Check the Driver List:  + +## Add Aliases + +On left menu:  >  : (Login pass by default: ADMIN / KYLIN) + +  + + +And automatically launch conection: + +  + + +## Connect and Execute + +The startup window when connected: + +  + + +Choose Tab: and write a query (whe use Kylinâs example cube): + +  + + +``` +select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers +from kylin_sales group by part_dt +order by part_dt +``` + +Execute With:  + +  + + +And itâs works! + +## Tips: + +SquirreL isnât the most stable SQL Client, but it is very flexible and get a lot of info; It can be used for PoC and checking connectivity issues. + +List of tables: + +  + + +List of columns of table: + +  + + +List of column of Querie: + +  + + +Export the result of queries: + +  + + + Info about time query execution: + +  http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/tableau.cn.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/tableau.cn.md b/website/_docs20/tutorial/tableau.cn.md new file mode 100644 index 0000000..e185b38 --- /dev/null +++ b/website/_docs20/tutorial/tableau.cn.md @@ -0,0 +1,116 @@ +--- +layout: docs20-cn +title: Tableauæç¨ +categories: æç¨ +permalink: /cn/docs20/tutorial/tableau.html +version: v1.2 +since: v0.7.1 +--- + +> Kylin ODBC驱å¨ç¨åºä¸Tableauåå¨ä¸äºéå¶ï¼è¯·å¨å°è¯åä»ç»é 读æ¬è¯´æä¹¦ã +> * ä» æ¯æâmanagedâåæè·¯å¾ï¼Kylin弿å°å¯¹æå¤ç维度æåº¦éæ¥é +> * 请å§ç»ä¼å éæ©äºå®è¡¨ï¼ç¶åä½¿ç¨æ£ç¡®çè¿æ¥æ¡ä»¶æ·»å æ¥æ¾è¡¨ï¼cubeä¸å·²å®ä¹çè¿æ¥ç±»åï¼ +> * 请å¿å°è¯å¨å¤ä¸ªäºå®è¡¨æå¤ä¸ªæ¥æ¾è¡¨ä¹é´è¿è¡è¿æ¥ï¼ +> * ä½ å¯ä»¥å°è¯ä½¿ç¨ç±»ä¼¼Tableauè¿æ»¤å¨ä¸seller idè¿æ ·çé«åºæ°ç»´åº¦ï¼ä½å¼æç°å¨å°åªè¿åæé个Tableauè¿æ»¤å¨ä¸çseller idã +> +> å¦éæ´å¤è¯¦ç»ä¿¡æ¯ææä»»ä½é®é¢ï¼è¯·èç³»Kylinå¢éï¼`kylino...@gmail.com` + + +### 使ç¨Tableau 9.xçç¨æ· +请åè[Tableau 9 æç¨](./tableau_91.html)以è·å¾æ´è¯¦ç»å¸®å©ã + +### æ¥éª¤1. å®è£ Kylin ODBC驱å¨ç¨åº +åè页é¢[Kylin ODBC 驱å¨ç¨åºæç¨](./odbc.html)ã + +### æ¥éª¤2. è¿æ¥å°Kylinæå¡å¨ +> æä»¬å»ºè®®ä½¿ç¨Connect Using Driverè䏿¯Using DSNã + +Connect Using Driver: 鿩左侧颿¿ä¸çâOther Database(ODBC)âåå¼¹åºçªå£çâKylinODBCDriverâã + + + +è¾å ¥ä½ çæå¡å¨ä½ç½®åè¯ä¹¦ï¼æå¡å¨ä¸»æºï¼ç«¯å£ï¼ç¨æ·ååå¯ç ã + + + +ç¹å»âConnectâè·åä½ ææé访é®ç项ç®å表ãæå ³æéç详ç»ä¿¡æ¯è¯·åè[Kylin Cube Permission Grant Tutorial](https://github.com/KylinOLAP/Kylin/wiki/Kylin-Cube-Permission-Grant-Tutorial)ãç¶åå¨ä¸æå表ä¸éæ©ä½ æ³è¦è¿æ¥ç项ç®ã + + + +ç¹å»âDoneâå®æè¿æ¥ã + + + +### æ¥éª¤3. 使ç¨å表æå¤è¡¨ +> éå¶ +> * å¿ é¡»é¦å éæ©äºå®è¡¨ +> * 请å¿ä» æ¯æä»æ¥æ¾è¡¨éæ© +> * è¿æ¥æ¡ä»¶å¿ é¡»ä¸cubeå®ä¹å¹é + +**éæ©äºå®è¡¨** + +éæ©`Multiple Tables`ã + + + +ç¶åç¹å»`Add Table...`æ·»å ä¸å¼ äºå®è¡¨ã + + + + + +**éæ©æ¥æ¾è¡¨** + +ç¹å»`Add Table...`æ·»å ä¸å¼ æ¥æ¾è¡¨ã + + + +ä»ç»å»ºç«è¿æ¥æ¡æ¬¾ã + + + +ç»§ç»éè¿ç¹å»`Add Table...`æ·»å 表ç´å°ææçæ¥æ¾è¡¨é½è¢«æ£ç¡®æ·»å ãå½åæ¤è¿æ¥ä»¥å¨Tableauä¸ä½¿ç¨ã + + + +**使ç¨Connect Live** + +`Data Connection`å ±æä¸ç§ç±»åãéæ©`Connect Live`é项ã + + + +ç¶åä½ å°±è½å¤å°½æ 使ç¨Tableauè¿è¡åæã + + + +**æ·»å é¢å¤æ¥æ¾è¡¨** + +ç¹å»é¡¶é¨èåæ ç`Data`ï¼éæ©`Edit Tables...`æ´æ°æ¥æ¾è¡¨ä¿¡æ¯ã + + + +### æ¥éª¤4. 使ç¨èªå®ä¹SQL +使ç¨èªå®ä¹SQL类似äºä½¿ç¨å表/å¤è¡¨ï¼ä½ä½ éè¦å¨`Custom SQL`æ ç¾å¤å¶ä½ çSQLåéåå䏿令ã + + + +### æ¥éª¤5. åå¸å°Tableauæå¡å¨ +å¦æä½ å·²ç»å®æä½¿ç¨Tableauå¶ä½ä¸ä¸ªä»ªè¡¨æ¿ï¼ä½ å¯ä»¥å°å®åå¸å°Tableauæå¡å¨ä¸ã +ç¹å»é¡¶é¨èåæ ç`Server`ï¼éæ©`Publish Workbook...`ã + + + +ç¶åç»éä½ çTableauæå¡å¨å¹¶åå¤åå¸ã + + + +å¦æä½ æ£å¨ä½¿ç¨Connect Using Driverè䏿¯DSNè¿æ¥ï¼ä½ è¿å°éè¦åµå ¥ä½ çå¯ç ãç¹å»å·¦ä¸æ¹ç`Authentication`æé®å¹¶éæ©`Embedded Password`ãç¹å»`Publish`ç¶åä½ å°çå°ç»æã + + + +### å°è´´å£« +* å¨Tableauä¸éè表å + + * Tableauå°ä¼æ ¹æ®æºè¡¨ååç»æ¾ç¤ºåï¼ä½ç¨æ·å¯è½å¸ææ ¹æ®å ¶ä»ä¸åç宿ç»ç»åã使ç¨Tableauä¸ç"Group by Folder"å¹¶å建æä»¶å¤¹æ¥å¯¹ä¸åçååç»ã + +  http://git-wip-us.apache.org/repos/asf/kylin/blob/7ea64f38/website/_docs20/tutorial/tableau.md ---------------------------------------------------------------------- diff --git a/website/_docs20/tutorial/tableau.md b/website/_docs20/tutorial/tableau.md new file mode 100644 index 0000000..e46b4e6 --- /dev/null +++ b/website/_docs20/tutorial/tableau.md @@ -0,0 +1,113 @@ +--- +layout: docs20 +title: Tableau 8 +categories: tutorial +permalink: /docs20/tutorial/tableau.html +--- + +> There are some limitations of Kylin ODBC driver with Tableau, please read carefully this instruction before you try it. +> +> * Only support "managed" analysis path, Kylin engine will raise exception for unexpected dimension or metric +> * Please always select Fact Table first, then add lookup tables with correct join condition (defined join type in cube) +> * Do not try to join between fact tables or lookup tables; +> * You can try to use high cardinality dimensions like seller id as Tableau Filter, but the engine will only return limited seller id in Tableau's filter now. + +### For Tableau 9.x User +Please refer to [Tableau 9.x Tutorial](./tableau_91.html) for detail guide. + +### Step 1. Install Kylin ODBC Driver +Refer to this guide: [Kylin ODBC Driver Tutorial](./odbc.html). + +### Step 2. Connect to Kylin Server +> We recommended to use Connect Using Driver instead of Using DSN. + +Connect Using Driver: Select "Other Database(ODBC)" in the left panel and choose KylinODBCDriver in the pop-up window. + + + +Enter your Sever location and credentials: server host, port, username and password. + + + +Click "Connect" to get the list of projects that you have permission to access. See details about permission in [Kylin Cube Permission Grant Tutorial](./acl.html). Then choose the project you want to connect in the drop down list. + + + +Click "Done" to complete the connection. + + + +### Step 3. Using Single Table or Multiple Tables +> Limitation +> +> * Must select FACT table first +> * Do not support select from lookup table only +> * The join condition must match within cube definition + +**Select Fact Table** + +Select `Multiple Tables`. + + + +Then click `Add Table...` to add a fact table. + + + + + +**Select Look-up Table** + +Click `Add Table...` to add a look-up table. + + + +Set up the join clause carefully. + + + +Keep add tables through click `Add Table...` until all the look-up tables have been added properly. Give the connection a name for use in Tableau. + + + +**Using Connect Live** + +There are three types of `Data Connection`. Choose the `Connect Live` option. + + + +Then you can enjoy analyzing with Tableau. + + + +**Add additional look-up Tables** + +Click `Data` in the top menu bar, select `Edit Tables...` to update the look-up table information. + + + +### Step 4. Using Customized SQL +To use customized SQL resembles using Single Table/Multiple Tables, except that you just need to paste your SQL in `Custom SQL` tab and take the same instruction as above. + + + +### Step 5. Publish to Tableau Server +Suppose you have finished making a dashboard with Tableau, you can publish it to Tableau Server. +Click `Server` in the top menu bar, select `Publish Workbook...`. + + + +Then sign in your Tableau Server and prepare to publish. + + + +If you're Using Driver Connect instead of DSN connect, you'll need to additionally embed your password in. Click the `Authentication` button at left bottom and select `Embedded Password`. Click `Publish` and you will see the result. + + + +### Tips +* Hide Table name in Tableau + + * Tableau will display columns be grouped by source table name, but user may want to organize columns with different structure. Using "Group by Folder" in Tableau and Create Folders to group different columns. + + 